A movie recommendation system built with Flask that combines collaborative filtering and NLP-based sentiment analysis to provide personalized film suggestions.
The project includes both the backend logic (Flask, machine learning models) and a web interface (HTML/CSS/JS).
- Build a recommendation engine that suggests movies similar to the ones a user likes.
- Use text features and similarity metrics to enhance recommendations.
- Provide an easy-to-use web interface for interaction.
Movie search with autocomplete functionality.
Top-10 personalized recommendations using collaborative filtering.
Sentiment analysis of reviews (via pre-trained NLP models).
Interactive web app built with Flask, HTML, CSS, and JavaScript.
- Programming Language: Python
- Framework: Flask
- Machine Learning: scikit-learn, pickle (saved models)
- Data Handling: Pandas, NumPy
- Frontend: HTML, CSS, JavaScript
- Notebooks: Jupyter (for preprocessing, collaborative filtering, sentiment analysis, visualization)
- IMDB 5000 Movie Dataset
- The Movies Dataset
- TMDB Movie Metadata
- List of American Films of 2018
- List of American Films of 2019
- List of American Films of 2020
- List of American Films of 2021
- List of American Films of 2022
- List of American Films of 2023
- List of American Films of 2024
- Data Aggregation: Combined multiple datasets including
movie_metadata,credits, andmovies_metadatafrom Kaggle - Comprehensive Coverage: Integrated diverse data sources to ensure robust and complete movie information
- Cross-Referencing: Established relationships between different data tables for comprehensive analysis
- Wikipedia Integration: Automated extraction of movie titles from Wikipedia pages for years 2016-2024
- Dynamic Data Collection: Implemented scraping scripts to gather up-to-date film information
- Structured Parsing: Converted unstructured web data into structured format for analysis
- Data Augmentation: Used The Movie Database API to fill missing information
- Function Development: Created specialized functions (
get_genre,get_director,get_actor) to extract specific movie attributes - Real-time Data Access: Leveraged API capabilities to access current and comprehensive movie data
- Normalization: Standardized data formats across all datasets
- Missing Value Handling: Implemented strategies to address incomplete or missing data
- Feature Engineering: Created combined features (e.g., 'comb' column) to enhance recommendation quality
- Data Validation: Ensured data consistency and accuracy throughout the pipeline
- Comparative Analysis: Evaluated both collaborative filtering and content-based approaches
- Performance Optimization: Selected content-based method due to superior performance metrics
- Cold Start Solution: Addressed new user and new movie challenges effectively
- Vectorization: Used CountVectorizer to convert movie features into numerical vectors
- Similarity Computation: Implemented cosine similarity to measure movie-to-movie relationships
- Matrix Construction: Built comprehensive similarity matrix covering all movies in the dataset
- Multi-dimensional Features: Combined genre, director, actors, and other metadata
- Feature Weighting: Applied appropriate weights to different feature types based on importance
- Text Processing: Processed combined features to create meaningful movie representations
- Top-N Selection: Returns 10 most similar movies for any given input film
- Error Handling: Includes robust error management for movies not found in database
- Case Insensitivity: Processes user input with case normalization for better matching
- Duplicate Prevention: Excludes the input movie itself from recommendations
- Review Processing: Analyzed user comments and reviews from multiple sources
- Sentiment Categorization: Classified reviews into positive, negative, or neutral sentiments
- Real-time Analysis: Implemented live sentiment analysis for user-generated content
- High Accuracy: Achieved 98.77% accuracy in sentiment classification
- Model Training: Used multinomial Naïve Bayes classifier for text classification
- Performance Validation: Tested model on separate datasets to ensure generalizability
- Text Vectorization: Applied Term Frequency-Inverse Document Frequency for text processing
- Feature Importance: Weighted terms based on their importance in distinguishing sentiments
- Dimensionality Reduction: Focused on most relevant features for efficient classification
- Vocabulary Optimization: Built comprehensive vocabulary from movie review corpus
- Pickle Serialization: Saved trained models and transformers for efficient deployment
- Real-time Processing: Enabled immediate sentiment analysis for user interactions
- Scalable Architecture: Designed system to handle increasing volumes of user reviews
Movie Suggestions (Autocomplete)

Description of the Searched Movie

Have fun trying it out! Feel free to reach out if you have any questions, suggestions, or just want to chat about the project.




