🩺🩻 Cardiopulmonary Diagnostics

From Deep Learning Research to MLOps Production Pipeline

Cardiopulmonary Diagnostics is a Deep Learning project focused on the automated analysis of 1D (audio) and 2D (image) medical signals. It implements three critical healthcare tasks:

Heart Sound Classification: Detecting abnormalities in PCG (Phonocardiogram) audio.
Pneumonia Classification: Diagnosing pneumonia from Chest X-Ray images (Binary & Multi-class).
Pneumonia Image Retrieval: A Content-Based Image Retrieval (CBIR) system to find visually similar historical cases for diagnosis support.

The project emphasizes clinical safety by prioritizing Recall/Sensitivity in evaluation metrics, ensuring that critical cases are not missed.

However, 87% of ML models never reach production. Therefore, this work also demonstrates the full lifecycle of an ML system: from Deep Learning research experimentation for medical tasks to a structured, production-oriented MLOps pipeline. The project is divided into two main phases:

Research (Level 0 MLOps): Exploration and experimentation with Jupyter Notebooks on 1D and 2D medical signals.
Production MLOps Pipeline (Level 1/2): A fully modular, decoupled, containerized, CI-enabled ML system applied to the Chest X-Ray Pneumonia Classification task.

The goal is not only to build accurate models, but to transform them into a reproducible, deployable, and maintainable ML system. This includes:

Modular architecture
Experiment tracking
External model registry
CI validation pipeline
Containerized deployment
API and UI serving layer

For a detailed explanation of the three tasks, including methodology, results, and conclusions, please refer to the Project Presentation. While, for the MLOps pipeline details, refer to the MLOps Presentation.

Research Phase (Level 0 MLOps)

Heart Sound Classification (1D Signal)

Goal: Classify heartbeats as Normal or Abnormal using Phonocardiograms (PCG).

Dataset: PhysioNet Challenge 2016 (3,240 recordings).
Techniques:
- Preprocessing: Audio segmentation/padding to fixed 5s length and conversion to log-scaled Mel Spectrograms (64 mels, N_FFT=256)
- Models: Comparison of Classical ML (SVC, Random Forest, LightGBM) using hand-crafted features (RMSE, ZCR, MFCCs, etc.) versus Deep Learning architectures (ResNet18, EfficientNet-B0, Custom CNN).
- Optimization: Hyperparameter tuning via Optuna for ML models and full re-training with Adam optimizer, ReduceLROnPlateau scheduler, and Mixed Precision Training (FP16) for CNNs (always using PyTorch).
Results:
- Best Model: Best Model(s): LightGBM (with statistics aggregation) achieved the highest ROC-AUC of 0.9658, while EfficientNet-B0 demonstrated superior sensitivity with a Recall of 0.99 on the test set.
- Safety: Decision thresholds were tuned to prioritize Recall, successfully minimizing False Negatives (e.g., EfficientNet-B0 missed only 1 abnormal case out of 100) for clinical screening safety.

Demo

Click to watch (and listen!) full video:

Pneumonia Classification (2D Signal)

Goal: Detect Pneumonia from Chest X-Rays and distinguish between Bacterial and Viral subtypes.

Dataset: Chest X-Ray Pneumonia (5,856 images).
Techniques:
- Pipeline: Full Fine-Tuning of ImageNet pre-trained models.
- Data Augmentation.
- Models: DenseNet121 vs EfficientNet-B0.
Results:
- Best Model: DenseNet121 v1 (without Dropout) for binary task, EfficientNet-B0 v1 (without Dropout) for Multi-Class Classification. They showed very similar performance for both models.
- Performance:
  - DenseNet121 achieved 99.7% Recall (binary, threshold tuned for target recall ≥ 98%) with robust multi-class discrimination (79% Macro Recall).
  - EfficientNet-B0 achieved 99.5% Recall (binary) and 81% Macro Recall (multi-class), showing strong performance on multi-class tasks but slightly lower than DenseNet121 for binary classification.

Demo

Click to watch full video:

Pneumonia Content-Based Image Retrieval (CBIR)

Goal: Retrieve the top-k most visually similar X-ray images from the database for a given query image to assist in comparative diagnosis.

Approach: Feature Extraction using the backbones of fine-tuned CNNs (removing the classifier head).
Embeddings: Compared MobileNetV3 (Pre-trained) vs Fine-tuned DenseNet121/EfficientNet-B0.
Retrieval Techniques: Cosine Similarity Search and K-Means Clustering.
Performance: DenseNet121 achieved a 82% Mean Precision@5, successfully retrieving images of the same pneumonia subtype.

Demo

Click to watch full video:

MLOps Production Pipeline (Applied to Pneumonia Classification)

The Pneumonia Classification task has been re-engineered into a structured MLOps system reaching Level 1 and partially Level 2 maturity, featuring:

Modularity: Clear separation of data processing, model training, evaluation, and inference components.
Containerization: Dockerized components for consistent environments and easy deployment.
Continuous Integration (CI):
- Static code analysis (Flake8)
- Unit Testing
- Docker build verification
- Triggered on every push via GitHub Actions
Continuous Deployment (Naive Implementation):
- Model automatically uploaded to Hugging Face if Recall ≥ 98%
- Does not include automated model comparison or drift monitoring

Tech Stack

Deep Learning Framework: PyTorch
Experiment Tracking: MLflow (local)
Model Registry: Hugging Face Hub
Containerization: Docker + Docker Compose
CI: GitHub Actions
Serving: FastAPI + Gradio
Testing: Pytest
Data Handling: Hugging Face Datasets
Configuration Management: dotenv

System Architecture

src/
├── data/                       # Data Layer
│ ├── __init__.py
│ ├── dataset.py
│ ├── download_data.py
│ └── image_transforms.py
├── models/                     # Model Layer
│ ├── __init__.py
│ ├── download_model.py
│ ├── evaluate.py
│ ├── image_cnn.py
│ ├── train.py
│ └── utils.py
├── serve/                      # Serving Layer
│ ├── app.py
│ └── ui.py
├── tests/                      # CI Layer
│ ├── test_api_endpoints.py
│ └── test_model_structure.py
.github/workflows/main.yml      # CI pipeline
Dockerfile
docker-compose.yml

MLOps Components

Data Layer:
- Data downloading (from Hugging Face)
- Deterministic preprocessing/augmentation
- Reproducible dataset splits.
Model Layer:
- Model definitions
- Modular training logic
- Experiment Tracking (MLflow)
  - Parameter logging
  - Metrics tracking
  - Artifact storage
  - Local experiment lineage
- Model Registry (Hugging Face Hub)
  - Versioned model artifacts
  - External centralized repository
  - Separation of training and serving
  - Reproducible deployment
- Evaluation with threshold tuning and atomic release, which means that the model is uploaded only if Recall ≥ 98%. This is a naive form of Continuous Deployment (CD), as a full production CD would require automated model comparison, and drift detection.
- Utilities for retrieving champion model weights from Hugging Face
Serving Layer:
- FastAPI backend application for REST API, input validation, model loading and JSON prediction response
- Gradio frontend UI for uploading X-ray images and displaying predictions and confidence scores.
Containerization:
- In MLOps, the environment is part of the model, therefore the entire system is containerized using Docker
- Dockerfile sets up the necessary dependencies and configurations for both the API and the UI, while docker-compose.yml orchestrates the API and UI services together
Continuous Integration (CI) Layer: Automated tests to validate API endpoints and model integrity, ensuring reliability before deployment

The current state of the project can be classified as Level 1 MLOps with elements of Level 2, as it does not yet have a fully automated deployment pipeline.

What's Missing for Full Production-Grade MLOps

Automated data drift detection
Fully automated retraining triggers
Cloud-hosted MLflow registry
Kubernetes orchestration and autoscaling
Continuous Monitoring

Running the MLOps System

Clone the repository
Create environment variables: Create a .env file in the root directory with your Hugging Face read access token.
```
HF_READ_TOKEN=your_token
```
Build and run containers: Use Docker Compose to start the containers
```
docker compose up --build
```
Access the application:
- API: http://localhost:8000/docs with interactive Swagger UI for testing endpoints, and http://localhost:8000/predict for direct POST requests.
- Gradio UI: http://localhost:7860

Instead, if you want to run the notebooks, you can use Google Colab for an easy setup, or run them locally with the provided requirements.txt for dependencies (note that this file reflect the MLOps pipeline dependencies, so it includes more packages than necessary for running the notebooks). More specifically, the notebooks are configured to download the trained model weights directly from Google Drive.

Furthermore, the notebooks/demos/ folder contains lightweight versions of the main tasks, designed for quick execution and demonstration purposes. These notebooks use the test subset of the data and pre-trained model weights to present the core functionalities for inference, without the need for training.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
assets		assets
notebooks		notebooks
presentations		presentations
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🩺🩻 Cardiopulmonary Diagnostics

From Deep Learning Research to MLOps Production Pipeline