Skip to content

mrinshad/eci-candidate-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🗳️ ECI Candidate Scraper & Processor

This project scrapes candidate data from the Election Commission of India affidavit portal, processes it, and prepares a clean dataset with Malayalam support.


📦 Project Overview

This pipeline consists of 3 scripts:

  1. Scraper → Extracts candidate data from the website
  2. Filter Script → Keeps only "Accepted" candidates
  3. Malayalam Script → Adds Malayalam transliteration fields

🛠️ Setup

1. Create virtual environment

python3 -m venv venv
source venv/bin/activate   # Mac/Linux

2. Install dependencies

pip install playwright beautifulsoup4 deep-translator
playwright install

🧾 1. Scraper Script (scraper.py)

🔹 Purpose

Scrapes all candidates from the ECI website and saves them into candidates.csv.

🔹 Key Features

  • Uses Playwright (handles JS-rendered site)

  • Iterates through pages using page parameter

  • Extracts:

    • Name
    • Party
    • State
    • Constituency
    • Status

▶️ Run

python scraper.py

📂 Output

candidates.csv

🧾 2. Filter Script (filter_accepted.py)

🔹 Purpose

Filters only candidates with status "Accepted"

🔹 Logic

  • Reads candidates.csv

  • Keeps only rows where:

    Status == "Accepted"
    
  • Writes to new file

▶️ Run

python filter_accepted.py

📂 Output

accepted_candidates.csv

🧾 3. Malayalam Script (add_malayalam.py)

🔹 Purpose

Adds Malayalam fields for:

  • Candidate Name
  • Constituency

🔹 Uses

  • Google Translator via deep-translator

🔹 Output Fields

Name,
Name_in_Mal,
Party,
State,
Constituency,
Constituency_in_Mal,
Status

▶️ Run

python add_malayalam.py

📂 Output

accepted_candidates_mal.csv

⚠️ Important Notes

🔸 Transliteration Accuracy

  • Malayalam conversion is not 100% perfect
  • Names may need manual correction

🔸 Recommended Approach

  • Use script output as base

  • Manually fix:

    • Candidate names (if needed)
    • Constituencies (recommended to map manually)

💡 Suggested Improvements

  • Add duplicate removal
  • Store data in SQL Server / database
  • Build API layer (.NET backend)
  • Create React UI with language toggle (EN ↔ ML)

🚀 Pipeline Summary

Scraper → Filter → Malayalam Processing
candidates.csv
   ↓
accepted_candidates.csv
   ↓
accepted_candidates_mal.csv

🎯 Final Output Example

Name,Name_in_Mal,Party,State,Constituency,Constituency_in_Mal,Status
P. K. KRISHNADAS,പി. കെ. കൃഷ്ണദാസ്,BJP,Kerala,KATTAKKADA,കാട്ടാക്കട,Accepted

🤝 Notes

  • Be respectful when scraping public websites (add delays)
  • Avoid sending too many rapid requests
  • This project is intended for educational / analytical use

🎉 Done!

You now have a complete pipeline from raw scraping → clean dataset → Malayalam-ready data.

About

A Python-based pipeline to scrape, filter, and enrich candidate data from the Election Commission of India affidavit portal, including Malayalam transliteration support.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages