IMDb Movie Review Classifier using Word2Vec
This was about Winter break of my third year (Dec, 2018) and I wanted to venture out in the field of Natural Language Processing. I took this project up from a basic Kaggle competition and used the sample solution in the following steps:
- I imported the code into my local machine
- I didn’t run it but understood each line in a top to bottom fashion. From text pre-processing to the concept of word vectors, bag of words and finally word2vec.
- I didn’t just use that one list of code but also read blogs and watched videos on how these basic linguistics were implemented and conceptualised.
- Finally, I wrote each line of code myself, ran it, de-bugged it, played with it and pushed it.
Learnings:
- This project was aimed at understanding the foundations of NLP: tokens, pre-processing segments and the concept of vectors
- I learned two different approaches to vectorise words and sentences : bag of words and word2vec
These may seem insignificant in the longer run for the trivial concepts they are, but at the time of this project I was proud that instead of spending months on a course with no practical understanding of the fundamentals and implementation of a methodology, I instead learned the concepts in a practical way in just 14 days.
Project code is available here