GitHub - epfl-ada/ada-2025-project-penta_data: This is the project for the ADA course (2025 Fall) at EPFL, and we are doing an interesting journey on the Stock Market Dataset. We are diving into the possible relationship between the stock market and the U.S. presidential election.

🤗 Welcome to this repo! This is the project for the ADA course (2025 Fall) at EPFL, and we are doing an interesting journey on the Stock Market Dataset. We are diving into the possible relationship between the stock market and the U.S. presidential election.

📖 For our interactive data story, please visit: https://sjj1017.github.io/ada_penta_data_story/

Abstract

This project investigates the complicated relationship between U.S. presidential elections and stock market performance, and explores whether political factors significantly influence financial markets. Our motivation comes from the belief that political uncertainty is a key factor of market volatility, and this project is exactly going to verify this belief. This project investigates the complex relationship between U.S. presidential elections and stock market performance, exploring whether political factors significantly influence financial markets and how the market may send signals to presidential elections. We first present the data profile to show that our project is based on individual stocks (non-ETFs) starting from 1991. We then analyze year-wise and month-wise price and volatility changes during election years and non-election years. Next, we zoom in to examine the sensitivity and political leaning of each stock using statistical methods such as regression. Finally, we focus on certain dramatic events to analyze stock behavior during these ephemeral periods using time series forecasting and counterfactual analysis.

Research Questions

In the previous attempts, we have found that during the long history, there are literally some interesting relationship between the stock market and the presidential election. According to this, we have several initial research questions (We may still find more during our analysis):

Is the election period's impact on the stock market statistically significant enough? Does the profound events during election have further influence? (For example, the Attempted assassination of Donald Trump in Pennsylvania)
Which stocks are the most sensitive to the election results and polling data?
Do stocks have their own political inclination?
And whether the different behavior of stock can be actually understood in a high-level?

Proposed additional datasets:

We expand the original dataset and add 3 additional datasets, including (1) industry & sector metadata, (2) election outcomes, and (3) polling data. The table below shows metadata for these datasets.

Category	Item	Source	Files / Directories	Size
Data Expansion	📈 Stocks	NASDAQ Trader directory + yfinance	`dataset/stocks/`, `dataset/etfs/`, `dataset/symbols_valid_meta.csv` (local only, not on GitHub)	~3GB
Industry / Sector Metadata	📈 Stocks	yfinance	`symbols_valid_meta_with_industry_sector.csv`	~7k–10k rows (U.S. equities + some ETFs)
Election Results	🗳️ Election	Wikipedia	`us_presidential_election_1876_2024.csv`	38 rows
Polling Data	🗳️ Election	538 Data Collection Platform	`pres_primary_avgs_1980-2016.csv`, `presidential_primary_averages_2024.csv`	~460k rows in total

The original dataset is expanded to 2025, in order to cover the two latest election period. We also set auto_adjust = True to get adjusted data. This dataset provides a wider coverage of the latest events that we want to study in the future (e.g., the attempted assassination of Donald Trump) more available election periods to conduct statistical analysis. However, we still acknowledge that the data contains some consecutive same numbers caused by the automatique filling, which can be also seen from the original data.

The Industry and Sector data provides additional sectoral and industral metadata of each symbol, which helps in our work to test the significance of election impact on stock data through the distribution of sectors. We may use this dataset to further examine the political inclination and the heterogeneity of stocks in the context of a certain event.

The Election Results contains not only the election day of each year but the exact party that won in a certain election. It works as a time anchor and helps to determine the election window or boundary and to analyze political inclinations.

The Polling data, though not used in the Milestone 2, potentially provides a more detailed timeline and information about the election period. We expect to use this data in the future to examine a more detailed event, not limited to the election day. Although there is problems in getting the data of 2020 because FiveThertyEight was shut down, it is still feasible to zoom in on one or several other elections from 1980-2024.

Methods

Discriptive Statistics: we begin by summarizing the key characteristics of the datasets, including sample size, missing values, and variable distributions. Histograms of stock and ETF starting dates illustrate the temporal coverage relative to election periods, while industry distributions show the sectoral composition of listed firms. This descriptive analysis helps assess data completeness, detect anomalies, and ensure the datasets are suitable for subsequent statistical modeling.
Regression: significance analysis was performed using t-tests and linear OLS regressions. T-tests compared returns and other stock metrics across different electoral outcomes. Regressions estimated sector- and stock-level sensitivities to political factors, reporting coefficients, standard errors, t-values, and corresponding p-values, with explanatory variables including Republican win, margin, election proximity, volatility, and momentum.
Seperation Metric: a separation metric based on the difference in mean cumulative abnormal returns (CAR), adjusted by Cohen’s d effect size, to measure the difference of consequences of election outcomes on an individual stock. It focuses both on statistical significance and numerical difference, filtering out random noise.
Time Series Forecasting (ARIMA): assess election-day impacts via an ARIMA counterfactual: for each stock, we fit an ARIMA model on a pre-event window of (log) returns—using excess returns (stock − market, e.g., vs. SPY) when a benchmark is available—then generate a post-event multi-step forecast and compute residuals as (actual − forecast). We evaluate whether the mean residual over the post window differs from zero using a one-sample t-test, interpreting significance as evidence of a short-horizon mean shift relative to the ARIMA baseline.

Quick Start

Installation

Clone the repository

git clone https://github.com/epfl-ada/ada-2025-project-penta_data.git
cd ada-2025-project-penta_data

Install required packages
```
pip install -r pip_requirements.txt
```

Download Dataset

The original dataset is about 3GB, so please download the data manually from: https://www.kaggle.com/datasets/jacksoncrow/stock-market-dataset

Extract and place the files in a dataset/ directory in the project root:

├── dataset/                        # Raw datasets
│   ├── stocks/                     # Individual stock CSV files
│   ├── etfs/                       # Individual ETF CSV files
│   └── symbols_valid_meta.csv      # Stock metadata

Running the Analysis

Navigate to p3_notebook.ipynb and run all cells to reproduce the analysis.

Team Contributions

Team Member	Email	Contributions
Jiajun Shen	jiajun.shen@epfl.ch	Introduction and Political Leaning Part
Yibo Yin	yibo.yin@epfl.ch	Event & volatility definition and Conclusion Part
Jinghao Zheng	jinghao.zheng@epfl.ch	General Part
Xinxian Ma	xinxian.ma@epfl.ch	Event Analysis and Machine Learning Part
Zhiyan Ke	zhiyan.ke@epfl.ch	Political Sensitivity Part

All team members contributed to discussions, code reviews, website developmet and the overall direction of the project.

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.vscode		.vscode
assets		assets
data		data
dataset		dataset
results		results
src		src
.gitignore		.gitignore
README.md		README.md
p2_notebook.ipynb		p2_notebook.ipynb
p3_notebook.ipynb		p3_notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

Research Questions

Proposed additional datasets:

Methods

Quick Start

Installation

Running the Analysis

Team Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Abstract

Research Questions

Proposed additional datasets:

Methods

Quick Start

Installation

Running the Analysis

Team Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages