IMDBScraper

Description

IMDBScraper is a Python project designed to scrape data from IMDB's Top Movies list. It extracts key information about movies, such as title, rating, number of ratings, description, release year, parental rating, duration, genres, directors, creators, actors, and keywords. The tool processes and saves the scraped data in both JSON and CSV formats for further analysis.

Features

🌟 Scrape IMDB Top Movies data.
📝 Extract movie details such as ratings, descriptions, genres, and more.
🔍 Fetch additional information from individual movie pages.
💾 Save data in JSON and CSV formats for convenience.

Prerequisites

Ensure the following dependencies are installed in your Python environment:

🛠️ requests
🛠️ json
🛠️ datetime
🛠️ fake-headers
🛠️ time
🛠️ tqdm
🛠️ re
🛠️ pandas

Install the required packages using pip:

pip install requests fake-headers tqdm pandas

Installation

📥 Clone the repository:

git clone https://github.com/yourusername/IMDBScraper.git
cd IMDBScraper

✅ Ensure Python 3.7 or higher is installed.

Usage

1. Initialize the Scraper

from scraper import Scraper
scraper = Scraper(browser="chrome", os="win")

2. Extract Main Page Data

Fetch data from the IMDB Top Movies page:

scraper.extract_mainpage()

3. Extract Individual Movie Data

Iterate through the list of movies and extract detailed data for each:

scraper.iterating()

4. Save Extracted Data

Save the scraped data to JSON and CSV files:

scraper.save_file()

Example Workflow

from scraper import Scraper

# Initialize the scraper
scraper = Scraper(browser="chrome", os="win")

# Extract main page data
scraper.extract_mainpage()

# Extract detailed movie data
scraper.iterating()

# Save the data
scraper.save_file()

Project Structure

IMDBScraper/
├── caller.py        # Run to start process
├── scraper.py       # Main scraping logic
├── requirements.txt # List of dependencies
├── ExtractedData/   # Directory to store output files
├── README.md        # Project documentation

Output

Extracted data is stored in the ExtractedData/ directory. The output includes:

📄 A JSON file containing all scraped data.
📊 A CSV file containing structured tabular data.

Example Output

JSON

{
    "movie_id": ["tt0111161", "tt0068646"],
    "title": ["The Shawshank Redemption", "The Godfather"],
    "rating": [9.3, 9.2],
    "No.rates": [2600000, 1800000],
    "description": [
        "Two imprisoned men bond over a number of years...",
        "The aging patriarch of an organized crime dynasty..."
    ],
    ...
}

CSV

movie_id	title	rating	No.rates	description	...
tt0111161	The Shawshank Redemption	9.3	2600000	Two imprisoned men bond over a number...	...
tt0068646	The Godfather	9.2	1800000	The aging patriarch of an organized...	...

Notes

⏳ The scraper adheres to a delay between requests to prevent IP banning.
🌐 Ensure a stable internet connection during scraping.
🔧 For larger datasets, adjust the iterating logic to process all movies.

Furthur More ?

Next Step I try analysis data with different Visualization tools, but in order I try tuning and more modularity about methods

So thanks for patient 🌛

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDBScraper

Description

Features

Prerequisites

Installation

Usage

1. Initialize the Scraper

2. Extract Main Page Data

3. Extract Individual Movie Data

4. Save Extracted Data

Example Workflow

Project Structure

Output

Example Output

JSON

CSV

Notes

Furthur More ?

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
caller.py		caller.py
scraper.py		scraper.py

Folders and files

Latest commit

History

Repository files navigation

IMDBScraper

Description

Features

Prerequisites

Installation

Usage

1. Initialize the Scraper

2. Extract Main Page Data

3. Extract Individual Movie Data

4. Save Extracted Data

Example Workflow

Project Structure

Output

Example Output

JSON

CSV

Notes

Furthur More ?

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages