Skip to content

Weinsz/cardiopulmonary-diagnostics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🩺🩻 Cardiopulmonary Diagnostics

From Deep Learning Research to MLOps Production Pipeline

Python PyTorch Optuna MLOps CI Docker FastAPI Gradio MLflow HuggingFace

Cardiopulmonary Diagnostics is a Deep Learning project focused on the automated analysis of 1D (audio) and 2D (image) medical signals. It implements three critical healthcare tasks:

  1. Heart Sound Classification: Detecting abnormalities in PCG (Phonocardiogram) audio.
  2. Pneumonia Classification: Diagnosing pneumonia from Chest X-Ray images (Binary & Multi-class).
  3. Pneumonia Image Retrieval: A Content-Based Image Retrieval (CBIR) system to find visually similar historical cases for diagnosis support.

The project emphasizes clinical safety by prioritizing Recall/Sensitivity in evaluation metrics, ensuring that critical cases are not missed.

However, 87% of ML models never reach production. Therefore, this work also demonstrates the full lifecycle of an ML system: from Deep Learning research experimentation for medical tasks to a structured, production-oriented MLOps pipeline. The project is divided into two main phases:

  1. Research (Level 0 MLOps): Exploration and experimentation with Jupyter Notebooks on 1D and 2D medical signals.
  2. Production MLOps Pipeline (Level 1/2): A fully modular, decoupled, containerized, CI-enabled ML system applied to the Chest X-Ray Pneumonia Classification task.

The goal is not only to build accurate models, but to transform them into a reproducible, deployable, and maintainable ML system. This includes:

  • Modular architecture
  • Experiment tracking
  • External model registry
  • CI validation pipeline
  • Containerized deployment
  • API and UI serving layer

For a detailed explanation of the three tasks, including methodology, results, and conclusions, please refer to the Project Presentation. While, for the MLOps pipeline details, refer to the MLOps Presentation.


Research Phase (Level 0 MLOps)

Heart Sound Classification (1D Signal)

Goal: Classify heartbeats as Normal or Abnormal using Phonocardiograms (PCG).

  • Dataset: PhysioNet Challenge 2016 (3,240 recordings).
  • Techniques:
    • Preprocessing: Audio segmentation/padding to fixed 5s length and conversion to log-scaled Mel Spectrograms (64 mels, N_FFT=256)
    • Models: Comparison of Classical ML (SVC, Random Forest, LightGBM) using hand-crafted features (RMSE, ZCR, MFCCs, etc.) versus Deep Learning architectures (ResNet18, EfficientNet-B0, Custom CNN).
    • Optimization: Hyperparameter tuning via Optuna for ML models and full re-training with Adam optimizer, ReduceLROnPlateau scheduler, and Mixed Precision Training (FP16) for CNNs (always using PyTorch).
  • Results:
    • Best Model: Best Model(s): LightGBM (with statistics aggregation) achieved the highest ROC-AUC of 0.9658, while EfficientNet-B0 demonstrated superior sensitivity with a Recall of 0.99 on the test set.
    • Safety: Decision thresholds were tuned to prioritize Recall, successfully minimizing False Negatives (e.g., EfficientNet-B0 missed only 1 abnormal case out of 100) for clinical screening safety.

Demo

Click to watch (and listen!) full video:

Heart Sound Demo


Pneumonia Classification (2D Signal)

Goal: Detect Pneumonia from Chest X-Rays and distinguish between Bacterial and Viral subtypes.

  • Dataset: Chest X-Ray Pneumonia (5,856 images).
  • Techniques:
    • Pipeline: Full Fine-Tuning of ImageNet pre-trained models.
    • Data Augmentation.
    • Models: DenseNet121 vs EfficientNet-B0.
  • Results:
    • Best Model: DenseNet121 v1 (without Dropout) for binary task, EfficientNet-B0 v1 (without Dropout) for Multi-Class Classification. They showed very similar performance for both models.
    • Performance:
      • DenseNet121 achieved 99.7% Recall (binary, threshold tuned for target recall β‰₯ 98%) with robust multi-class discrimination (79% Macro Recall).
      • EfficientNet-B0 achieved 99.5% Recall (binary) and 81% Macro Recall (multi-class), showing strong performance on multi-class tasks but slightly lower than DenseNet121 for binary classification.

Demo

Click to watch full video:

Pneumonia Classification Demo


Pneumonia Content-Based Image Retrieval (CBIR)

Goal: Retrieve the top-k most visually similar X-ray images from the database for a given query image to assist in comparative diagnosis.

  • Approach: Feature Extraction using the backbones of fine-tuned CNNs (removing the classifier head).
  • Embeddings: Compared MobileNetV3 (Pre-trained) vs Fine-tuned DenseNet121/EfficientNet-B0.
  • Retrieval Techniques: Cosine Similarity Search and K-Means Clustering.
  • Performance: DenseNet121 achieved a 82% Mean Precision@5, successfully retrieving images of the same pneumonia subtype.

Demo

Click to watch full video:

Pneumonia Image Retrieval Demo


MLOps Production Pipeline (Applied to Pneumonia Classification)

The Pneumonia Classification task has been re-engineered into a structured MLOps system reaching Level 1 and partially Level 2 maturity, featuring:

  • Modularity: Clear separation of data processing, model training, evaluation, and inference components.

  • Containerization: Dockerized components for consistent environments and easy deployment.

  • Continuous Integration (CI):

    • Static code analysis (Flake8)
    • Unit Testing
    • Docker build verification
    • Triggered on every push via GitHub Actions
  • Continuous Deployment (Naive Implementation):

    • Model automatically uploaded to Hugging Face if Recall β‰₯ 98%
    • Does not include automated model comparison or drift monitoring

Tech Stack

  • Deep Learning Framework: PyTorch
  • Experiment Tracking: MLflow (local)
  • Model Registry: Hugging Face Hub
  • Containerization: Docker + Docker Compose
  • CI: GitHub Actions
  • Serving: FastAPI + Gradio
  • Testing: Pytest
  • Data Handling: Hugging Face Datasets
  • Configuration Management: dotenv

System Architecture

src/
β”œβ”€β”€ data/                       # Data Layer
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ dataset.py
β”‚ β”œβ”€β”€ download_data.py
β”‚ └── image_transforms.py
β”œβ”€β”€ models/                     # Model Layer
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ download_model.py
β”‚ β”œβ”€β”€ evaluate.py
β”‚ β”œβ”€β”€ image_cnn.py
β”‚ β”œβ”€β”€ train.py
β”‚ └── utils.py
β”œβ”€β”€ serve/                      # Serving Layer
β”‚ β”œβ”€β”€ app.py
β”‚ └── ui.py
β”œβ”€β”€ tests/                      # CI Layer
β”‚ β”œβ”€β”€ test_api_endpoints.py
β”‚ └── test_model_structure.py
.github/workflows/main.yml      # CI pipeline
Dockerfile
docker-compose.yml

MLOps Components

  • Data Layer:
    • Data downloading (from Hugging Face)
    • Deterministic preprocessing/augmentation
    • Reproducible dataset splits.
  • Model Layer:
    • Model definitions
    • Modular training logic
    • Experiment Tracking (MLflow)
      • Parameter logging
      • Metrics tracking
      • Artifact storage
      • Local experiment lineage
    • Model Registry (Hugging Face Hub)
      • Versioned model artifacts
      • External centralized repository
      • Separation of training and serving
      • Reproducible deployment
    • Evaluation with threshold tuning and atomic release, which means that the model is uploaded only if Recall β‰₯ 98%. This is a naive form of Continuous Deployment (CD), as a full production CD would require automated model comparison, and drift detection.
    • Utilities for retrieving champion model weights from Hugging Face
  • Serving Layer:
    • FastAPI backend application for REST API, input validation, model loading and JSON prediction response
    • Gradio frontend UI for uploading X-ray images and displaying predictions and confidence scores.
  • Containerization:
    • In MLOps, the environment is part of the model, therefore the entire system is containerized using Docker
    • Dockerfile sets up the necessary dependencies and configurations for both the API and the UI, while docker-compose.yml orchestrates the API and UI services together
  • Continuous Integration (CI) Layer: Automated tests to validate API endpoints and model integrity, ensuring reliability before deployment

The current state of the project can be classified as Level 1 MLOps with elements of Level 2, as it does not yet have a fully automated deployment pipeline.

What's Missing for Full Production-Grade MLOps

  • Automated data drift detection
  • Fully automated retraining triggers
  • Cloud-hosted MLflow registry
  • Kubernetes orchestration and autoscaling
  • Continuous Monitoring

Running the MLOps System

  1. Clone the repository

  2. Create environment variables: Create a .env file in the root directory with your Hugging Face read access token.

    HF_READ_TOKEN=your_token
    
  3. Build and run containers: Use Docker Compose to start the containers

    docker compose up --build
  4. Access the application:

    • API: http://localhost:8000/docs with interactive Swagger UI for testing endpoints, and http://localhost:8000/predict for direct POST requests.
    • Gradio UI: http://localhost:7860

Instead, if you want to run the notebooks, you can use Google Colab for an easy setup, or run them locally with the provided requirements.txt for dependencies (note that this file reflect the MLOps pipeline dependencies, so it includes more packages than necessary for running the notebooks). More specifically, the notebooks are configured to download the trained model weights directly from Google Drive.

Furthermore, the notebooks/demos/ folder contains lightweight versions of the main tasks, designed for quick execution and demonstration purposes. These notebooks use the test subset of the data and pre-trained model weights to present the core functionalities for inference, without the need for training.

Demo example of the inference with the MLOps system:

Pneumonia Classification Demo

Releases

No releases published

Packages

 
 
 

Contributors

Languages