IMDb Movie Review Classifier using Word2Vec

This was about Winter break of my third year (Dec, 2018) and I wanted to venture out in the field of Natural Language Processing. I took this project up from a basic Kaggle competition and used the sample solution in the following steps:

  1. I imported the code into my local machine
  2. I didn’t run it but understood each line in a top to bottom fashion. From text pre-processing to the concept of word vectors, bag of words and finally word2vec.
  3. I didn’t just use that one list of code but also read blogs and watched videos on how these basic linguistics were implemented and conceptualised.
  4. Finally, I wrote each line of code myself, ran it, de-bugged it, played with it and pushed it.

Learnings:

  • This project was aimed at understanding the foundations of NLP: tokens, pre-processing segments and the concept of vectors
  • I learned two different approaches to vectorise words and sentences : bag of words and word2vec

These may seem insignificant in the longer run for the trivial concepts they are, but at the time of this project I was proud that instead of spending months on a course with no practical understanding of the fundamentals and implementation of a methodology, I instead learned the concepts in a practical way in just 14 days.

Project code is available here

Shreya Gupta
Shreya Gupta
Research Associate

Aim to understand the world behind the three lines of code (import, train, test), challenge conventional approaches and build more efficient and applicable algorithms

Related