A production-ready web application implementing a two-stage hierarchical machine learning system for automated motor fault detection and classification using vibration signal analysis. Achieves 100% binary classification and 99.48% multi-class classification accuracy on the MAFAULDA dataset.
- Overview
- Key Features
- Architecture
- Performance
- Installation
- Usage
- Technical Details
- Dataset
- Model Explainability
- Project Structure
This project implements an intelligent predictive maintenance system for rotating machinery, specifically designed for industrial motor fault diagnosis. By leveraging wavelet-based feature extraction and hierarchical classification, the system provides:
- ✅ Real-time fault detection - Binary classification (Normal vs Fault)
- ✅ Precise fault diagnosis - Multi-class identification of 9 specific fault types
- ✅ Explainable AI - SHAP analysis for feature importance
- ✅ Production-ready deployment - Web-based Streamlit interface
| Fault Category | Types | Description |
|---|---|---|
| Rotor Faults | Imbalance | Unbalanced mass distribution |
| Horizontal Misalignment | Shaft misalignment in horizontal plane | |
| Vertical Misalignment | Shaft misalignment in vertical plane | |
| Bearing Faults (Overhang) | Ball Fault | Rolling element defect |
| Cage Fault | Bearing cage damage | |
| Outer Race Fault | Outer raceway defect | |
| Bearing Faults (Underhang) | Ball Fault | Rolling element defect |
| Cage Fault | Bearing cage damage | |
| Outer Race Fault | Outer raceway defect |
- 100% accuracy in fault vs normal detection
- 99.48% accuracy in specific fault classification
- Zero false alarms in binary classification
- Biorthogonal 3.1 wavelet decomposition (level 4)
- 273 statistical features from time-frequency domain:
- Time-domain: Mean, std dev, variance, RMS, percentiles
- Shape: Kurtosis, skewness
- Signal characteristics: Zero/mean crossing rates, entropy
- Transform features: Hilbert magnitude
- Two-stage classification for improved robustness
- Stage 1: Binary classifier with SMOTE for class balancing
- Stage 2: Multi-class classifier for fault identification
- Computational efficiency: Stage 2 only invoked when fault detected
- SHAP analysis for feature importance visualization
- Interpretable predictions for maintenance decision-making
- Trust and transparency for industrial deployment
- Streamlit web interface - No coding required
- CSV file upload - Simple data input
- Real-time predictions - Instant results
- Visualization - Confusion matrices and confidence scores
┌─────────────────────────────────────────────────────────────┐
│ Input: Vibration Signal │
│ (250,000 samples, 50kHz) │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Wavelet Decomposition (Bior3.1, L4) │
│ Approximation + 4 Detail Coefficients × 8 Sensor Channels │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Feature Extraction (273 features) │
│ Statistical + Shape + Signal + Transform Features │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 1: Binary Classification (LightGBM) │
│ Normal vs Fault │
│ Accuracy: 100% │
└────────────────────────┬────────────────────────────────────┘
│
Is Fault? ──────No──────> Normal
│
Yes
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 2: Multi-class Classification (LightGBM) │
│ Identify Specific Fault Type │
│ Accuracy: 99.48% │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Fault Type Prediction │
│ B/C/D/E/F/G/H/I/J with Confidence Score │
└─────────────────────────────────────────────────────────────┘
Metrics:
- ✅ Accuracy: 100%
- ✅ Precision: 1.00
- ✅ Recall: 1.00
- ✅ F1-Score: 1.00
Metrics:
- ✅ Overall Accuracy: 99.48%
- ✅ Macro-avg Precision: 0.995
- ✅ Macro-avg Recall: 0.995
- ✅ Macro-avg F1-Score: 0.995
Per-Class Performance:
| Fault Type | Samples | Accuracy | Notes |
|---|---|---|---|
| Imbalance (B) | 67 | 100% | ✅ Perfect |
| Horizontal Misalignment (C) | 39 | 100% | ✅ Perfect |
| Vertical Misalignment (D) | 60 | 96.7% | |
| Overhang Ball (E) | 27 | 100% | ✅ Perfect |
| Overhang Cage (F) | 38 | 100% | ✅ Perfect |
| Overhang Outer Race (G) | 38 | 100% | ✅ Perfect |
| Underhang Ball (H) | 37 | 100% | ✅ Perfect |
| Underhang Cage (I) | 38 | 100% | ✅ Perfect |
| Underhang Outer Race (J) | 37 | 100% | ✅ Perfect |
- Python 3.8 or higher
- pip package manager
- Virtual environment (recommended)
- Clone the repository
git clone https://github.com/ayushraj09/vibration-analysis.git
cd vibration-analysis- Create virtual environment
# Using venv
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate- Install dependencies
pip install -r requirements.txtstreamlit>=1.28.0
numpy>=1.24.0
pandas>=2.0.0
pywavelets>=1.4.1
lightgbm>=4.0.0
scikit-learn>=1.3.0
shap>=0.42.0
imbalanced-learn>=0.11.0
joblib>=1.3.0
matplotlib>=3.7.0
seaborn>=0.12.0
streamlit run src/app.pyThe application will open in your default web browser at http://localhost:8501
- Click "Browse files" or drag-and-drop a CSV file
- Supported format: CSV with 8 columns (sensor data)
- Expected data: 250,000 samples per measurement at 50kHz
The system will:
- Extract features using wavelet decomposition
- Classify as Normal or Fault (Stage 1)
- If Fault, identify specific fault type (Stage 2)
- Display prediction with confidence score
- Normal: No maintenance required
- Fault Type B-J: Review fault description and schedule appropriate maintenance
The system uses Biorthogonal 3.1 (Bior3.1) mother wavelet, selected based on comprehensive analysis by Das & Das (2023) showing superior performance for rotating machinery fault detection.
Mathematical Formulation:
Where:
-
$\psi(t)$ = Mother wavelet (Bior3.1) -
$a$ = Scaling parameter -
$b$ = Translation parameter -
$\psi^*$ = Complex conjugate
Decomposition Parameters:
- Level: 4
- Output: 1 approximation + 4 detail coefficient sets
- Sensor channels: 8 (underhang accelerometer 3-axis, overhang accelerometer 3-axis, tachometer, microphone)
Statistical Features (per coefficient set):
- Central tendency: Mean, Median
- Dispersion: Standard deviation, Variance, RMS
- Distribution shape: Kurtosis, Skewness
- Percentiles: 5th, 25th, 75th, 95th
- Signal characteristics: Zero crossing rate, Mean crossing rate
- Information theory: Shannon entropy
- Transform domain: Hilbert transform magnitude
Total Features: 273 (≈15 features × 5 coefficient sets × 8 channels)
LGBMClassifier(
boosting_type='gbdt',
objective='binary',
learning_rate=0.05,
n_estimators=100,
num_leaves=31,
max_depth=-1,
min_child_samples=20,
random_state=42
)Class Balancing: SMOTE (Synthetic Minority Over-sampling Technique)
- Original: 49 normal, 1,902 faulty
- Balanced: Equal representation
LGBMClassifier(
boosting_type='gbdt',
objective='multiclass',
num_class=9,
learning_rate=0.05,
n_estimators=100,
num_leaves=31,
max_depth=-1,
min_child_samples=20,
random_state=42
)Training Strategy: 80-20 train-test split on fault samples only
| Feature | Advantage |
|---|---|
| Speed | Faster training than Random Forest |
| Memory | Lower memory footprint |
| Accuracy | Competitive with other ensemble methods |
| Scalability | Handles large datasets efficiently |
| Industry-ready | Proven in production environments |
Source: Federal University of Rio de Janeiro (UFRJ)
Link: https://www02.smt.ufrj.br/~offshore/mfs/page_01.html
Experimental Setup:
- Machine: SpectraQuest Machinery Fault Simulator (ABVT)
- Motor: 1/4 HP DC, 700-3600 RPM
- Sensors:
- IMI 601A01 accelerometers (underhang: 3-axis)
- IMI 604B31 accelerometer (overhang: 3-axis)
- Monarch MT-190 tachometer
- Shure SM81 microphone
- Sampling: 50 kHz, 5 seconds (250,000 samples/measurement)
- Bearing specs: 8 rolling elements, 0.7145 cm ball diameter
Dataset Composition:
| Fault Type | Measurements | Variations |
|---|---|---|
| Normal | 49 | 737-3686 RPM |
| Imbalance | 333 | 6-35g weights |
| Horizontal Misalignment | 197 | 0.5-2.0mm shifts |
| Vertical Misalignment | 301 | 0.51-1.90mm shifts |
| Bearing Faults (Underhang) | 558 | Ball/Cage/Outer race × (0,6,20,35g) |
| Bearing Faults (Overhang) | 513 | Ball/Cage/Outer race × (0,6,20,35g) |
| Total | 1,951 | - |
Key Characteristic: Bearing faults coupled with imbalance (0, 6, 20, 35g) as they are imperceptible without imbalance.
Top Contributing Features:
- col6_mean_d4 - Mean of detail coefficient 4 (column 6)
- col3_mean_approx - Mean of approximation coefficient (column 3)
- col6_mean_d3 - Mean of detail coefficient 3 (column 6)
- col3_std_d4 - Standard deviation of detail coefficient 4 (column 3)
Key Insights:
- ✅ Higher decomposition levels (d3, d4) capture critical fault signatures
- ✅ Mean and standard deviation are most discriminative
- ✅ Multiple sensor channels contribute synergistically
- ✅ Both approximation and detail coefficients are important
- High values (red/pink): Push predictions toward specific fault classes
- Low values (blue): Indicate different fault characteristics
- Bidirectional impact: Demonstrates complex, non-linear decision boundaries
vibration-analysis/
│
├── src/
│ ├── app.py # Streamlit web application
│ ├── utils.py # Feature extraction utilities
│ ├── lgbm_binary_model.joblib # Trained binary classifier
│ ├── lgbm_multi_model.joblib # Trained multi-class classifier
│ └── label_encoder.joblib # Label encoder for fault classes
│
├── screenshots/
│ ├── binary_cm.png # Binary confusion matrix
│ ├── multi_cm.png # Multi-class confusion matrix
│ ├── shap.png # SHAP feature importance
│
├── notebooks/
│ └── training.ipynb # Model training notebook
│
├── docs/
│ └── technical_report.pdf # Detailed technical report
│
├── requirements.txt # Python dependencies
├── README.md # This file
├── LICENSE # MIT License
└── .gitignore # Git ignore rules
This work is based on the methodology from:
@article{das2023smart,
title={Smart machine fault diagnostics based on fault specified discrete wavelet transform},
author={Das, Oguzhan and Bagci Das, Duygu},
journal={Journal of the Brazilian Society of Mechanical Sciences and Engineering},
volume={45},
number={55},
year={2023},
publisher={Springer},
doi={10.1007/s40430-022-03975-0}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Das & Das (2023) - Wavelet-based fault detection methodology
- UFRJ Team - MAFAULDA dataset creation and maintenance
- LightGBM - Microsoft Research
- Streamlit - Streamlit Inc.
- PyWavelets - PyWavelets Development Team
- scikit-learn - scikit-learn developers
- SHAP - Scott Lundberg & Su-In Lee
- MAFAULDA Database - Federal University of Rio de Janeiro
- Ribeiro et al. (2018) - Dataset curation and documentation


