The excitement has begun. I hope you enjoyed part 1 of the series on how to turn your NLP model in jupyter notebooks into production-ready models, where we have successfully automated a series of steps including data acquisition, data processing, model training, model evaluation, and model packaging. That’s all fantastic progress — except we are still far from putting the model in use and actually drive some value. In part 2, we will continue the process of further automating our model development process that will involve building a functional backend and frontend in a docker container, which will be hosted…
S.o.s.
Emergencies could happen anytime, anywhere. As we navigate through this devastating global pandemic, we are now witnessing the absolute brilliance of all the frontline first responders. In fact, sending rescue units are only part of their already laborious jobs. When a disaster hit, aid request messages would come fast and furiously, from all different channels. Human labor can be quickly overrun by all the messages requesting rescue, from all different channels.
Therefore, an automated and reliable way of swiftly classifying the request for rescue message into the correct corresponding channel. A natural choice would be building a message classifier…
Imagine you are starting a new business. You have an amazing product that could potentially disrupt a 100-million-dollar industry, but virtually nobody knows about it. How would you get your first set of early adopters?
The easy answer is to make your content discoverable from the internet- getting on the top results on the first page of the commonly used search engines would do wonders for your website’s organic traffic. However, it has never been harder to achieve such a goal as simply flooding the web with the same content would not bump your ranks on those modern-day search engines…
Let’s continue our sentiment analysis journey.
Remember last time we talked about using the bag of words model to detect sentiments of review texts, which we already have a relatively good performance. Today we will build upon what we have already accomplished and upgrade the bag of words model with a smart weighting scheme. In this post, we will utilize the custom pipeline StreamlinedModel
object from part 1, and witness the amazing improvements gained from applying this TF-IDF transformer from simple models like logistic regression.
We already know that counting word frequency could help us gauge the sentiment of a…
Do you know that we already (almost) solved sentiment analysis problems before neural networks become ubiquitous?
In the current era of natural language processing (NLP) increasingly relying on deep learning models that generate amazing performances, we have often overlooked the importance of the simplest types of text vectorization techniques — bag of words (BOW) and term frequency-inverse document frequency (TF-IDF). In fact, with these two techniques, we have already been able to predict the sentiment of a given piece of text with more than 80% accuracy. In other words, all we are doing with state-of-art deep learning models are simply…
Time to start some new learning adventures in natural language processing.
What? Isn’t this supposed to be kickstarted by a spam detection project? As you might probably think, cipher decryption might be a curious place to start an NLP series. This starting place has strong ties to do my desire to understand the word embedding from a deeper level. With the help from open-source NLP packages like SpaCy and NLTK, we are able to generate a fully functional word-embedding from a giant training text corpus, with a single line of code — seems like magic to me. While it is…
Take a guess. What is the world’s most valuable asset right now?
It is not gold, not crude oil… it is data. You must have heard about the popular buzz word “big data”, and wondering what exactly that term means. Think about your favorite music streaming services — Spotify, Pandora… etc. Every second across the world, many different users login to the service, have their fair share of interactions with the service. Since every move corresponds to a collectible data point, you would imagine the challenges to store such large data.
Fortunately, these large datasets could bring real value to…
2018. The future is already here.
Following the gold rush in artificial intelligence, a new career track called “data scientists” has taken the world by storm. With a combination of skills in business intuition and technical soundness, data science is considered the sexiest job in the 21st century. The data science community has witnessed an explosive 275% growth over a short span of 7 years.
Data Scientist @ Fair, Full-Stack Machine Learning Advocate, Basketball Player Training How to Dunk, Life-long Knicks Fan, Living the Dream