Final Project 01205489(Principles of Deep Learning and Applications)
This project's purpose is to analyze and classify sarcasm sentences using the dataset from Kaggle The dataset is from Kaggle Dataset.
The project is made of 3 parts; visualize and analyze data, ML model prediction, and Glove pre-train model prediction. For the first part, we clean and preprocess data with stop words, and punctuation and then visualize data with the distribution of length of the word, number of words in the headline, and average word length in the headline to find if that dataset is biased or not. Common words, N-gram analysis, and word cloud is needed for one-word prediction. The second part is the tokenization of words into vectors and trains in ML models. The decision tree and Random Forest show us an interesting result. Lastly, Glove pre-train model by using word embedding gave an incredible result with the long period of time to train For Special Section, compared Glove model with different embedding words and show why I use a combined dataset
