Disaster message classification model from research to production — Automatic containerization and deployment of the model

The excitement has begun. I hope you enjoyed part 1 of the series on how to turn your NLP model in jupyter notebooks into production-ready models, where we have successfully automated a series of steps including data acquisition, data processing, model training, model evaluation, and model packaging. That’s all fantastic progress — except we are still far from putting the model in use and actually drive some value. In part 2, we will continue the process of further automating our model development process that will involve building a functional backend and frontend in a docker container, which will be hosted…

Natural Language Processing

Disaster message classification model from research to production — Automatic training, testing and packaging of the model


Emergencies could happen anytime, anywhere. As we navigate through this devastating global pandemic, we are now witnessing the absolute brilliance of all the frontline first responders. In fact, sending rescue units are only part of their already laborious jobs. When a disaster hit, aid request messages would come fast and furiously, from all different channels. Human labor can be quickly overrun by all the messages requesting rescue, from all different channels.

Therefore, an automated and reliable way of swiftly classifying the request for rescue message into the correct corresponding channel. A natural choice would be building a message classifier…

Natural Language Processing

Applications of N-grams in Content Spinning for Internet Marketing

Imagine you are starting a new business. You have an amazing product that could potentially disrupt a 100-million-dollar industry, but virtually nobody knows about it. How would you get your first set of early adopters?

The easy answer is to make your content discoverable from the internet- getting on the top results on the first page of the commonly used search engines would do wonders for your website’s organic traffic. However, it has never been harder to achieve such a goal as simply flooding the web with the same content would not bump your ranks on those modern-day search engines…

Natural Language Processing

Applications of Sklearn Pipelines, SHAP and Object-oriented programming in Sentiment Analysis

Let’s continue our sentiment analysis journey.

Remember last time we talked about using the bag of words model to detect sentiments of review texts, which we already have a relatively good performance. Today we will build upon what we have already accomplished and upgrade the bag of words model with a smart weighting scheme. In this post, we will utilize the custom pipeline StreamlinedModel object from part 1, and witness the amazing improvements gained from applying this TF-IDF transformer from simple models like logistic regression.

Term Frequency — Inverse Document Frequency (TF-IDF)

We already know that counting word frequency could help us gauge the sentiment of a…

Natural Language Processing

Applications of Sklearn Pipelines, SHAP and Object-oriented programming in Sentiment Analysis

Do you know that we already (almost) solved sentiment analysis problems before neural networks become ubiquitous?

In the current era of natural language processing (NLP) increasingly relying on deep learning models that generate amazing performances, we have often overlooked the importance of the simplest types of text vectorization techniques — bag of words (BOW) and term frequency-inverse document frequency (TF-IDF). In fact, with these two techniques, we have already been able to predict the sentiment of a given piece of text with more than 80% accuracy. In other words, all we are doing with state-of-art deep learning models are simply…

Natural Language Processing

How this project helped me understand those legendary word bigrams

Time to start some new learning adventures in natural language processing.

What? Isn’t this supposed to be kickstarted by a spam detection project? As you might probably think, cipher decryption might be a curious place to start an NLP series. This starting place has strong ties to do my desire to understand the word embedding from a deeper level. With the help from open-source NLP packages like SpaCy and NLTK, we are able to generate a fully functional word-embedding from a giant training text corpus, with a single line of code — seems like magic to me. While it is…

A practical example of how to bring values to your business using data science

Take a guess. What is the world’s most valuable asset right now?

It is not gold, not crude oil… it is data. You must have heard about the popular buzz word “big data”, and wondering what exactly that term means. Think about your favorite music streaming services — Spotify, Pandora… etc. Every second across the world, many different users login to the service, have their fair share of interactions with the service. Since every move corresponds to a collectible data point, you would imagine the challenges to store such large data.

Fortunately, these large datasets could bring real value to…

Are you ready for your brand new data science career? Here is what you should know about data scientist careers learned from data science

2018. The future is already here.

Following the gold rush in artificial intelligence, a new career track called “data scientists” has taken the world by storm. With a combination of skills in business intuition and technical soundness, data science is considered the sexiest job in the 21st century. The data science community has witnessed an explosive 275% growth over a short span of 7 years.

Bowen Chen

Data Scientist @ Fair, Full-Stack Machine Learning Advocate, Basketball Player Training How to Dunk, Life-long Knicks Fan, Living the Dream

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store