🎬 Movies Analysis | SQL Project

This project performs in-depth SQL-based analysis on the MovieLens dataset — a popular dataset containing millions of movie ratings, tags, and metadata. The analysis spans beginner to advanced levels, covering SQL queries, views, CTEs, stored procedures, and window functions.

⭐ Features

Dataset setup and table creation using PostgreSQL
Designed 30+ custom SQL questions across beginner, intermediate, and advanced levels
Dataset setup and table creation using PostgreSQL
Imported and managed 1.5+ GB of real-world movie rating data from MovieLens
Performed genre classification, tag relevance analysis, and user behavior insights
Implemented views, stored procedures, window functions, and recursive CTEs
Wrote optimized queries with EXPLAIN ANALYZE for performance evaluation
Simulated ACID transactions and used temporary/materialized views for advanced use cases

🛠 Technologies Used

Tool/Service	Purpose
Kaggle	Dataset exploration
PostgreSQL	Database management
pgAdmin 4	GUI for database operations

⚙️ How It Works

📥 Downloaded data from MovieLens
🗃 Created database and tables using PostgreSQL
📌 Imported CSV data (~1.5 GB) via pgAdmin 4
🔎 Executed 30+ SQL queries across 3 difficulty levels
🧠 Created views, functions, materialized views, and CTEs
📈 Extracted insights on movie trends, top genres, user behavior, and tag relevance

🔍 Preview

📄 Click here to view the full project log (PDF)

📁 Folder Structure

MoviesAnalysisSQL/
├── Project_log.pdf
├── Questions.md
└── README.md

💡 Use Cases

📊 Analyzing user behavior through ratings and tags
🎯 Finding most rated or top-rated movies by genre
🧬 Working with tag genome and relevance analysis
🏗 Practicing SQL queries from beginner to advanced levels
🧪 Demonstrating database normalization and function creation
📚 Building a PostgreSQL portfolio project

🧰 Setup Instructions

Install PostgreSQL and pgAdmin 4
Download dataset from: MovieLens Dataset
Run the SQL scripts from the project log to create tables
Import CSV files into respective tables (Format: CSV, Encoding: UTF-8)
Run the SQL queries as per your learning phase (Beginner → Advanced)

📝 Notes

The dataset includes 33,832,162 ratings and 2,328,315 tag applications for 86,537 movies
User and movie IDs are anonymized and consistent across all files
Ratings are made on a 0.5–5.0 star scale; genres are pipe-separated (e.g., Action|Adventure|Sci-Fi)
All files are encoded in UTF-8 — ensure correct encoding during import in pgAdmin
Timestamps are stored in UNIX format (seconds since January 1, 1970 UTC)

🙋‍♀️ Author

Anushka Sharma
🌐 LinkedIn • 🐱 GitHub 🎓 Learning Data Science, Analytics & Machine Learning

⭐ Show Your Support

If you found this project helpful or inspiring:

⭐ Star this repository
🛠️ Fork it to build upon or adapt it for your own use
💬 Share feedback or suggestions via Issues/Discussions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Movies Analysis | SQL Project

⭐ Features

🛠 Technologies Used

⚙️ How It Works

🔍 Preview

📁 Folder Structure

💡 Use Cases

🧰 Setup Instructions

📝 Notes

🙋‍♀️ Author

⭐ Show Your Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Project_log.pdf		Project_log.pdf
Questions.md		Questions.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🎬 Movies Analysis | SQL Project

⭐ Features

🛠 Technologies Used

⚙️ How It Works

🔍 Preview

📁 Folder Structure

💡 Use Cases

🧰 Setup Instructions

📝 Notes

🙋‍♀️ Author

⭐ Show Your Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages