This project scrapes candidate data from the Election Commission of India affidavit portal, processes it, and prepares a clean dataset with Malayalam support.
This pipeline consists of 3 scripts:
- Scraper → Extracts candidate data from the website
- Filter Script → Keeps only "Accepted" candidates
- Malayalam Script → Adds Malayalam transliteration fields
python3 -m venv venv
source venv/bin/activate # Mac/Linuxpip install playwright beautifulsoup4 deep-translator
playwright installScrapes all candidates from the ECI website and saves them into candidates.csv.
-
Uses Playwright (handles JS-rendered site)
-
Iterates through pages using
pageparameter -
Extracts:
- Name
- Party
- State
- Constituency
- Status
python scraper.pycandidates.csv
Filters only candidates with status "Accepted"
-
Reads
candidates.csv -
Keeps only rows where:
Status == "Accepted" -
Writes to new file
python filter_accepted.pyaccepted_candidates.csv
Adds Malayalam fields for:
- Candidate Name
- Constituency
- Google Translator via
deep-translator
Name,
Name_in_Mal,
Party,
State,
Constituency,
Constituency_in_Mal,
Status
python add_malayalam.pyaccepted_candidates_mal.csv
- Malayalam conversion is not 100% perfect
- Names may need manual correction
-
Use script output as base
-
Manually fix:
- Candidate names (if needed)
- Constituencies (recommended to map manually)
- Add duplicate removal
- Store data in SQL Server / database
- Build API layer (.NET backend)
- Create React UI with language toggle (EN ↔ ML)
Scraper → Filter → Malayalam Processing
candidates.csv
↓
accepted_candidates.csv
↓
accepted_candidates_mal.csv
Name,Name_in_Mal,Party,State,Constituency,Constituency_in_Mal,Status
P. K. KRISHNADAS,പി. കെ. കൃഷ്ണദാസ്,BJP,Kerala,KATTAKKADA,കാട്ടാക്കട,Accepted- Be respectful when scraping public websites (add delays)
- Avoid sending too many rapid requests
- This project is intended for educational / analytical use
You now have a complete pipeline from raw scraping → clean dataset → Malayalam-ready data.