Skip to content

epfl-ada/ada-2025-project-penta_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

162 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Market's Vote

Data Story Course Dataset

🤗 Welcome to this repo! This is the project for the ADA course (2025 Fall) at EPFL, and we are doing an interesting journey on the Stock Market Dataset. We are diving into the possible relationship between the stock market and the U.S. presidential election.

📖 For our interactive data story, please visit: https://sjj1017.github.io/ada_penta_data_story/

Abstract

This project investigates the complicated relationship between U.S. presidential elections and stock market performance, and explores whether political factors significantly influence financial markets. Our motivation comes from the belief that political uncertainty is a key factor of market volatility, and this project is exactly going to verify this belief. This project investigates the complex relationship between U.S. presidential elections and stock market performance, exploring whether political factors significantly influence financial markets and how the market may send signals to presidential elections. We first present the data profile to show that our project is based on individual stocks (non-ETFs) starting from 1991. We then analyze year-wise and month-wise price and volatility changes during election years and non-election years. Next, we zoom in to examine the sensitivity and political leaning of each stock using statistical methods such as regression. Finally, we focus on certain dramatic events to analyze stock behavior during these ephemeral periods using time series forecasting and counterfactual analysis.

Research Questions

In the previous attempts, we have found that during the long history, there are literally some interesting relationship between the stock market and the presidential election. According to this, we have several initial research questions (We may still find more during our analysis):

  • Is the election period's impact on the stock market statistically significant enough? Does the profound events during election have further influence? (For example, the Attempted assassination of Donald Trump in Pennsylvania)
  • Which stocks are the most sensitive to the election results and polling data?
  • Do stocks have their own political inclination?
  • And whether the different behavior of stock can be actually understood in a high-level?

Proposed additional datasets:

We expand the original dataset and add 3 additional datasets, including (1) industry & sector metadata, (2) election outcomes, and (3) polling data. The table below shows metadata for these datasets.

Category Item Source Files / Directories Size
Data Expansion 📈 Stocks NASDAQ Trader directory + yfinance dataset/stocks/, dataset/etfs/, dataset/symbols_valid_meta.csv (local only, not on GitHub) ~3GB
Industry / Sector Metadata 📈 Stocks yfinance symbols_valid_meta_with_industry_sector.csv ~7k–10k rows (U.S. equities + some ETFs)
Election Results 🗳️ Election Wikipedia us_presidential_election_1876_2024.csv 38 rows
Polling Data 🗳️ Election 538 Data Collection Platform pres_primary_avgs_1980-2016.csv, presidential_primary_averages_2024.csv ~460k rows in total

The original dataset is expanded to 2025, in order to cover the two latest election period. We also set auto_adjust = True to get adjusted data. This dataset provides a wider coverage of the latest events that we want to study in the future (e.g., the attempted assassination of Donald Trump) more available election periods to conduct statistical analysis. However, we still acknowledge that the data contains some consecutive same numbers caused by the automatique filling, which can be also seen from the original data.

The Industry and Sector data provides additional sectoral and industral metadata of each symbol, which helps in our work to test the significance of election impact on stock data through the distribution of sectors. We may use this dataset to further examine the political inclination and the heterogeneity of stocks in the context of a certain event.

The Election Results contains not only the election day of each year but the exact party that won in a certain election. It works as a time anchor and helps to determine the election window or boundary and to analyze political inclinations.

The Polling data, though not used in the Milestone 2, potentially provides a more detailed timeline and information about the election period. We expect to use this data in the future to examine a more detailed event, not limited to the election day. Although there is problems in getting the data of 2020 because FiveThertyEight was shut down, it is still feasible to zoom in on one or several other elections from 1980-2024.

Methods

  • Discriptive Statistics: we begin by summarizing the key characteristics of the datasets, including sample size, missing values, and variable distributions. Histograms of stock and ETF starting dates illustrate the temporal coverage relative to election periods, while industry distributions show the sectoral composition of listed firms. This descriptive analysis helps assess data completeness, detect anomalies, and ensure the datasets are suitable for subsequent statistical modeling.
  • Regression: significance analysis was performed using t-tests and linear OLS regressions. T-tests compared returns and other stock metrics across different electoral outcomes. Regressions estimated sector- and stock-level sensitivities to political factors, reporting coefficients, standard errors, t-values, and corresponding p-values, with explanatory variables including Republican win, margin, election proximity, volatility, and momentum.
  • Seperation Metric: a separation metric based on the difference in mean cumulative abnormal returns (CAR), adjusted by Cohen’s d effect size, to measure the difference of consequences of election outcomes on an individual stock. It focuses both on statistical significance and numerical difference, filtering out random noise.
  • Time Series Forecasting (ARIMA): assess election-day impacts via an ARIMA counterfactual: for each stock, we fit an ARIMA model on a pre-event window of (log) returns—using excess returns (stock − market, e.g., vs. SPY) when a benchmark is available—then generate a post-event multi-step forecast and compute residuals as (actual − forecast). We evaluate whether the mean residual over the post window differs from zero using a one-sample t-test, interpreting significance as evidence of a short-horizon mean shift relative to the ARIMA baseline.

Quick Start

Installation

  1. Clone the repository

    git clone https://github.com/epfl-ada/ada-2025-project-penta_data.git
    cd ada-2025-project-penta_data
  2. Install required packages

    pip install -r pip_requirements.txt
  3. Download Dataset

    The original dataset is about 3GB, so please download the data manually from: https://www.kaggle.com/datasets/jacksoncrow/stock-market-dataset

    Extract and place the files in a dataset/ directory in the project root:

    ├── dataset/                        # Raw datasets
    │   ├── stocks/                     # Individual stock CSV files
    │   ├── etfs/                       # Individual ETF CSV files
    │   └── symbols_valid_meta.csv      # Stock metadata
    

Running the Analysis

Navigate to p3_notebook.ipynb and run all cells to reproduce the analysis.


Team Contributions

Team Member Email Contributions
Jiajun Shen jiajun.shen@epfl.ch Introduction and Political Leaning Part
Yibo Yin yibo.yin@epfl.ch Event & volatility definition and Conclusion Part
Jinghao Zheng jinghao.zheng@epfl.ch General Part
Xinxian Ma xinxian.ma@epfl.ch Event Analysis and Machine Learning Part
Zhiyan Ke zhiyan.ke@epfl.ch Political Sensitivity Part

All team members contributed to discussions, code reviews, website developmet and the overall direction of the project.

About

This is the project for the ADA course (2025 Fall) at EPFL, and we are doing an interesting journey on the Stock Market Dataset. We are diving into the possible relationship between the stock market and the U.S. presidential election.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors