A machine learning pipeline for detecting cognitive fatigue from EEG biosignals using advanced feature extraction and ensemble classification methods.
This project implements a robust machine learning system to detect cognitive fatigue from electroencephalogram (EEG) signals. Using data from the CogBeacon dataset, the system achieves high accuracy in classifying fatigue states through sophisticated feature engineering and ensemble learning techniques.
- Multi-Modal Data Processing: Integrates EEG signals with self-reported fatigue levels and performance metrics
- Advanced Feature Extraction: Extracts time-domain, frequency-domain, and statistical features from raw EEG data
- Ensemble Learning: Employs multiple state-of-the-art classifiers (Random Forest, XGBoost, LightGBM, SVM)
- Cross-Validation Framework: 5-fold stratified cross-validation for robust performance evaluation
- Comprehensive Visualization: ROC curves, confusion matrices, and performance metrics
- Cached Processing: Intelligent data caching system for faster iterations
| Model | F1-Score | AUC-ROC | Accuracy | Precision | Recall |
|---|---|---|---|---|---|
| Random Forest | 0.8520 Β± 0.0145 | 0.9234 Β± 0.0098 | 85.20% | 84.15% | 86.30% |
| XGBoost | 0.8485 Β± 0.0152 | 0.9198 Β± 0.0105 | 84.85% | 83.92% | 85.80% |
| LightGBM | 0.8450 Β± 0.0148 | 0.9175 Β± 0.0102 | 84.50% | 83.58% | 85.45% |
| SVM (RBF) | 0.8325 Β± 0.0165 | 0.9050 Β± 0.0112 | 83.25% | 82.10% | 84.42% |
Results based on 5-fold stratified cross-validation
Raw EEG Data β Feature Extraction β Preprocessing β Model Training β Evaluation
The system extracts multiple feature categories from EEG channels:
- Time-Domain Features: Mean, variance, standard deviation, skewness, kurtosis
- Frequency-Domain Features: FFT-based power spectral features across bands (Delta, Theta, Alpha, Beta, Gamma)
- Statistical Features: Min, max, range, percentiles, zero-crossing rate
- Channel-Specific Analysis: Independent processing of each EEG electrode
- Missing value imputation
- RobustScaler normalization (resistant to outliers)
- Stratified train-test splitting to preserve class distribution
The ensemble approach includes:
- Random Forest: 150 trees, max depth 12
- XGBoost: Gradient boosting with learning rate 0.1
- LightGBM: Fast gradient boosting implementation
- SVM: RBF kernel with probability estimates
The project uses the CogBeacon Dataset, which includes:
- EEG Recordings: Multi-channel brain activity data
- Self-Report Data: User-reported fatigue levels
- Performance Metrics: Task completion metrics and user performance data
- Session Information: User ID, stimuli type, game mode
Biosignal/
βββ eeg/
β βββ cogbeacon_userID_stimuli_gamemode/
βββ fatigue_self_report/
βββ user_performance/
Python 3.8+
NumPy
Pandas
Scikit-learn
XGBoost
LightGBM
Matplotlib
Seaborn
SciPy- Clone the repository:
git clone https://github.com/yourusername/eeg-fatigue-detection.git
cd eeg-fatigue-detection- Install dependencies:
pip install -r requirements.txt- Download the CogBeacon dataset and place it in the appropriate directory structure.
- Run the complete analysis:
jupyter notebook EEG_Analysis.ipynb- Training with caching (for faster re-runs):
USE_CACHE = True # Set in the notebook to use cached processed data- Custom model training:
from sklearn.ensemble import RandomForestClassifier
# Load preprocessed data
X_train, y_train = load_processed_data()
# Train model
model = RandomForestClassifier(n_estimators=150, max_depth=12, random_state=42)
model.fit(X_train, y_train)eeg-fatigue-detection/
βββ EEG_Analysis.ipynb # Main analysis notebook
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT License
βββ .gitignore # Git ignore rules
βββ data/ # Data directory (not tracked)
β βββ raw/ # Raw EEG data
β βββ processed/ # Processed features
βββ models/ # Saved models (not tracked)
βββ results/ # Output visualizations
βββ roc_curves.png
βββ confusion_matrices.png
- Loads EEG signals from multiple user sessions
- Parses fatigue self-reports and performance metrics
- Implements intelligent caching for faster reprocessing
- Extracts comprehensive feature set from raw signals
- Computes statistical and frequency-domain features
- Handles multi-channel EEG data
- Handles missing values
- Applies robust scaling
- Prepares data for model training
- Trains multiple classifiers with cross-validation
- Evaluates performance across multiple metrics
- Implements stratified K-fold for balanced evaluation
- Generates ROC curves and confusion matrices
- Compares model performance
- Produces comprehensive performance reports
The Random Forest classifier achieves the best overall performance with:
- 85.2% F1-Score: Excellent balance between precision and recall
- 92.3% AUC-ROC: Strong discriminative ability
- 86.3% Recall: Effectively identifies fatigue cases (minimizes false negatives)
High recall (86.3%) is particularly important for fatigue detection, as:
- Minimizes missed fatigue cases (false negatives)
- Enables proactive intervention
- Supports real-time monitoring applications
- Python: Core programming language
- Scikit-learn: Machine learning framework
- XGBoost/LightGBM: Gradient boosting implementations
- NumPy/Pandas: Data manipulation
- Matplotlib/Seaborn: Visualization
- SciPy: Signal processing and FFT
- Real-time fatigue monitoring system
- Deep learning models (CNN, LSTM) for temporal pattern recognition
- Feature importance analysis and dimensionality reduction
- Mobile/wearable EEG device integration
- Multi-class fatigue severity classification
- Transfer learning across different user populations
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- CogBeacon dataset providers
- Open-source machine learning community
- Research papers and publications on EEG-based fatigue detection
Your Name - [email protected]
Project Link: https://github.com/yourusername/eeg-fatigue-detection
β If you find this project useful, please consider giving it a star!