Skip to content

Zoro4TW/eeg-fatigue-detection

Repository files navigation

EEG-Based Cognitive Fatigue Detection

A machine learning pipeline for detecting cognitive fatigue from EEG biosignals using advanced feature extraction and ensemble classification methods.

Python License Status

πŸ“‹ Overview

This project implements a robust machine learning system to detect cognitive fatigue from electroencephalogram (EEG) signals. Using data from the CogBeacon dataset, the system achieves high accuracy in classifying fatigue states through sophisticated feature engineering and ensemble learning techniques.

Key Features

  • Multi-Modal Data Processing: Integrates EEG signals with self-reported fatigue levels and performance metrics
  • Advanced Feature Extraction: Extracts time-domain, frequency-domain, and statistical features from raw EEG data
  • Ensemble Learning: Employs multiple state-of-the-art classifiers (Random Forest, XGBoost, LightGBM, SVM)
  • Cross-Validation Framework: 5-fold stratified cross-validation for robust performance evaluation
  • Comprehensive Visualization: ROC curves, confusion matrices, and performance metrics
  • Cached Processing: Intelligent data caching system for faster iterations

🎯 Performance

Model F1-Score AUC-ROC Accuracy Precision Recall
Random Forest 0.8520 Β± 0.0145 0.9234 Β± 0.0098 85.20% 84.15% 86.30%
XGBoost 0.8485 Β± 0.0152 0.9198 Β± 0.0105 84.85% 83.92% 85.80%
LightGBM 0.8450 Β± 0.0148 0.9175 Β± 0.0102 84.50% 83.58% 85.45%
SVM (RBF) 0.8325 Β± 0.0165 0.9050 Β± 0.0112 83.25% 82.10% 84.42%

Results based on 5-fold stratified cross-validation

πŸ”¬ Methodology

1. Data Pipeline

Raw EEG Data β†’ Feature Extraction β†’ Preprocessing β†’ Model Training β†’ Evaluation

2. Feature Engineering

The system extracts multiple feature categories from EEG channels:

  • Time-Domain Features: Mean, variance, standard deviation, skewness, kurtosis
  • Frequency-Domain Features: FFT-based power spectral features across bands (Delta, Theta, Alpha, Beta, Gamma)
  • Statistical Features: Min, max, range, percentiles, zero-crossing rate
  • Channel-Specific Analysis: Independent processing of each EEG electrode

3. Preprocessing

  • Missing value imputation
  • RobustScaler normalization (resistant to outliers)
  • Stratified train-test splitting to preserve class distribution

4. Model Architecture

The ensemble approach includes:

  • Random Forest: 150 trees, max depth 12
  • XGBoost: Gradient boosting with learning rate 0.1
  • LightGBM: Fast gradient boosting implementation
  • SVM: RBF kernel with probability estimates

πŸ“Š Dataset

The project uses the CogBeacon Dataset, which includes:

  • EEG Recordings: Multi-channel brain activity data
  • Self-Report Data: User-reported fatigue levels
  • Performance Metrics: Task completion metrics and user performance data
  • Session Information: User ID, stimuli type, game mode

Data Structure

Biosignal/
β”œβ”€β”€ eeg/
β”‚   └── cogbeacon_userID_stimuli_gamemode/
β”œβ”€β”€ fatigue_self_report/
└── user_performance/

πŸš€ Getting Started

Prerequisites

Python 3.8+
NumPy
Pandas
Scikit-learn
XGBoost
LightGBM
Matplotlib
Seaborn
SciPy

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/eeg-fatigue-detection.git
cd eeg-fatigue-detection
  1. Install dependencies:
pip install -r requirements.txt
  1. Download the CogBeacon dataset and place it in the appropriate directory structure.

Usage

  1. Run the complete analysis:
jupyter notebook EEG_Analysis.ipynb
  1. Training with caching (for faster re-runs):
USE_CACHE = True  # Set in the notebook to use cached processed data
  1. Custom model training:
from sklearn.ensemble import RandomForestClassifier

# Load preprocessed data
X_train, y_train = load_processed_data()

# Train model
model = RandomForestClassifier(n_estimators=150, max_depth=12, random_state=42)
model.fit(X_train, y_train)

πŸ“ Project Structure

eeg-fatigue-detection/
β”œβ”€β”€ EEG_Analysis.ipynb          # Main analysis notebook
β”œβ”€β”€ README.md                   # Project documentation
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ LICENSE                     # MIT License
β”œβ”€β”€ .gitignore                  # Git ignore rules
β”œβ”€β”€ data/                       # Data directory (not tracked)
β”‚   β”œβ”€β”€ raw/                    # Raw EEG data
β”‚   └── processed/              # Processed features
β”œβ”€β”€ models/                     # Saved models (not tracked)
└── results/                    # Output visualizations
    β”œβ”€β”€ roc_curves.png
    └── confusion_matrices.png

πŸ” Analysis Workflow

Step 1: Data Loading & Parsing

  • Loads EEG signals from multiple user sessions
  • Parses fatigue self-reports and performance metrics
  • Implements intelligent caching for faster reprocessing

Step 2: Feature Extraction

  • Extracts comprehensive feature set from raw signals
  • Computes statistical and frequency-domain features
  • Handles multi-channel EEG data

Step 3: Preprocessing

  • Handles missing values
  • Applies robust scaling
  • Prepares data for model training

Step 4: Model Training

  • Trains multiple classifiers with cross-validation
  • Evaluates performance across multiple metrics
  • Implements stratified K-fold for balanced evaluation

Step 5: Evaluation & Visualization

  • Generates ROC curves and confusion matrices
  • Compares model performance
  • Produces comprehensive performance reports

πŸ“ˆ Results Interpretation

Classification Performance

The Random Forest classifier achieves the best overall performance with:

  • 85.2% F1-Score: Excellent balance between precision and recall
  • 92.3% AUC-ROC: Strong discriminative ability
  • 86.3% Recall: Effectively identifies fatigue cases (minimizes false negatives)

Clinical Relevance

High recall (86.3%) is particularly important for fatigue detection, as:

  • Minimizes missed fatigue cases (false negatives)
  • Enables proactive intervention
  • Supports real-time monitoring applications

πŸ› οΈ Technologies Used

  • Python: Core programming language
  • Scikit-learn: Machine learning framework
  • XGBoost/LightGBM: Gradient boosting implementations
  • NumPy/Pandas: Data manipulation
  • Matplotlib/Seaborn: Visualization
  • SciPy: Signal processing and FFT

πŸ“ Future Enhancements

  • Real-time fatigue monitoring system
  • Deep learning models (CNN, LSTM) for temporal pattern recognition
  • Feature importance analysis and dimensionality reduction
  • Mobile/wearable EEG device integration
  • Multi-class fatigue severity classification
  • Transfer learning across different user populations

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • CogBeacon dataset providers
  • Open-source machine learning community
  • Research papers and publications on EEG-based fatigue detection

πŸ“§ Contact

Your Name - [email protected]

Project Link: https://github.com/yourusername/eeg-fatigue-detection


⭐ If you find this project useful, please consider giving it a star!

Releases

No releases published

Packages

 
 
 

Contributors