Cardiopulmonary Diagnostics is a Deep Learning project focused on the automated analysis of 1D (audio) and 2D (image) medical signals. It implements three critical healthcare tasks:
- Heart Sound Classification: Detecting abnormalities in PCG (Phonocardiogram) audio.
- Pneumonia Classification: Diagnosing pneumonia from Chest X-Ray images (Binary & Multi-class).
- Pneumonia Image Retrieval: A Content-Based Image Retrieval (CBIR) system to find visually similar historical cases for diagnosis support.
The project emphasizes clinical safety by prioritizing Recall/Sensitivity in evaluation metrics, ensuring that critical cases are not missed.
However, 87% of ML models never reach production. Therefore, this work also demonstrates the full lifecycle of an ML system: from Deep Learning research experimentation for medical tasks to a structured, production-oriented MLOps pipeline. The project is divided into two main phases:
- Research (Level 0 MLOps): Exploration and experimentation with Jupyter Notebooks on 1D and 2D medical signals.
- Production MLOps Pipeline (Level 1/2): A fully modular, decoupled, containerized, CI-enabled ML system applied to the Chest X-Ray Pneumonia Classification task.
The goal is not only to build accurate models, but to transform them into a reproducible, deployable, and maintainable ML system. This includes:
- Modular architecture
- Experiment tracking
- External model registry
- CI validation pipeline
- Containerized deployment
- API and UI serving layer
For a detailed explanation of the three tasks, including methodology, results, and conclusions, please refer to the Project Presentation. While, for the MLOps pipeline details, refer to the MLOps Presentation.
Goal: Classify heartbeats as Normal or Abnormal using Phonocardiograms (PCG).
- Dataset: PhysioNet Challenge 2016 (3,240 recordings).
- Techniques:
- Preprocessing: Audio segmentation/padding to fixed 5s length and conversion to log-scaled Mel Spectrograms (64 mels,
N_FFT=256) - Models: Comparison of Classical ML (SVC, Random Forest, LightGBM) using hand-crafted features (RMSE, ZCR, MFCCs, etc.) versus Deep Learning architectures (ResNet18, EfficientNet-B0, Custom CNN).
- Optimization: Hyperparameter tuning via Optuna for ML models and full re-training with Adam optimizer, ReduceLROnPlateau scheduler, and Mixed Precision Training (FP16) for CNNs (always using PyTorch).
- Preprocessing: Audio segmentation/padding to fixed 5s length and conversion to log-scaled Mel Spectrograms (64 mels,
- Results:
- Best Model: Best Model(s): LightGBM (with statistics aggregation) achieved the highest ROC-AUC of 0.9658, while EfficientNet-B0 demonstrated superior sensitivity with a Recall of 0.99 on the test set.
- Safety: Decision thresholds were tuned to prioritize Recall, successfully minimizing False Negatives (e.g., EfficientNet-B0 missed only 1 abnormal case out of 100) for clinical screening safety.
Click to watch (and listen!) full video:
Goal: Detect Pneumonia from Chest X-Rays and distinguish between Bacterial and Viral subtypes.
- Dataset: Chest X-Ray Pneumonia (5,856 images).
- Techniques:
- Pipeline: Full Fine-Tuning of ImageNet pre-trained models.
- Data Augmentation.
- Models: DenseNet121 vs EfficientNet-B0.
- Results:
- Best Model: DenseNet121 v1 (without Dropout) for binary task, EfficientNet-B0 v1 (without Dropout) for Multi-Class Classification. They showed very similar performance for both models.
- Performance:
- DenseNet121 achieved 99.7% Recall (binary, threshold tuned for target recall β₯ 98%) with robust multi-class discrimination (79% Macro Recall).
- EfficientNet-B0 achieved 99.5% Recall (binary) and 81% Macro Recall (multi-class), showing strong performance on multi-class tasks but slightly lower than DenseNet121 for binary classification.
Click to watch full video:
Goal: Retrieve the top-k most visually similar X-ray images from the database for a given query image to assist in comparative diagnosis.
- Approach: Feature Extraction using the backbones of fine-tuned CNNs (removing the classifier head).
- Embeddings: Compared MobileNetV3 (Pre-trained) vs Fine-tuned DenseNet121/EfficientNet-B0.
- Retrieval Techniques: Cosine Similarity Search and K-Means Clustering.
- Performance: DenseNet121 achieved a 82% Mean Precision@5, successfully retrieving images of the same pneumonia subtype.
Click to watch full video:
The Pneumonia Classification task has been re-engineered into a structured MLOps system reaching Level 1 and partially Level 2 maturity, featuring:
-
Modularity: Clear separation of data processing, model training, evaluation, and inference components.
-
Containerization: Dockerized components for consistent environments and easy deployment.
-
Continuous Integration (CI):
- Static code analysis (Flake8)
- Unit Testing
- Docker build verification
- Triggered on every push via GitHub Actions
-
Continuous Deployment (Naive Implementation):
- Model automatically uploaded to Hugging Face if Recall β₯ 98%
- Does not include automated model comparison or drift monitoring
- Deep Learning Framework: PyTorch
- Experiment Tracking: MLflow (local)
- Model Registry: Hugging Face Hub
- Containerization: Docker + Docker Compose
- CI: GitHub Actions
- Serving: FastAPI + Gradio
- Testing: Pytest
- Data Handling: Hugging Face Datasets
- Configuration Management: dotenv
src/
βββ data/ # Data Layer
β βββ __init__.py
β βββ dataset.py
β βββ download_data.py
β βββ image_transforms.py
βββ models/ # Model Layer
β βββ __init__.py
β βββ download_model.py
β βββ evaluate.py
β βββ image_cnn.py
β βββ train.py
β βββ utils.py
βββ serve/ # Serving Layer
β βββ app.py
β βββ ui.py
βββ tests/ # CI Layer
β βββ test_api_endpoints.py
β βββ test_model_structure.py
.github/workflows/main.yml # CI pipeline
Dockerfile
docker-compose.yml
- Data Layer:
- Data downloading (from Hugging Face)
- Deterministic preprocessing/augmentation
- Reproducible dataset splits.
- Model Layer:
- Model definitions
- Modular training logic
- Experiment Tracking (MLflow)
- Parameter logging
- Metrics tracking
- Artifact storage
- Local experiment lineage
- Model Registry (Hugging Face Hub)
- Versioned model artifacts
- External centralized repository
- Separation of training and serving
- Reproducible deployment
- Evaluation with threshold tuning and atomic release, which means that the model is uploaded only if Recall β₯ 98%. This is a naive form of Continuous Deployment (CD), as a full production CD would require automated model comparison, and drift detection.
- Utilities for retrieving champion model weights from Hugging Face
- Serving Layer:
- FastAPI backend application for REST API, input validation, model loading and JSON prediction response
- Gradio frontend UI for uploading X-ray images and displaying predictions and confidence scores.
- Containerization:
- In MLOps, the environment is part of the model, therefore the entire system is containerized using Docker
Dockerfilesets up the necessary dependencies and configurations for both the API and the UI, whiledocker-compose.ymlorchestrates the API and UI services together
- Continuous Integration (CI) Layer: Automated tests to validate API endpoints and model integrity, ensuring reliability before deployment
The current state of the project can be classified as Level 1 MLOps with elements of Level 2, as it does not yet have a fully automated deployment pipeline.
- Automated data drift detection
- Fully automated retraining triggers
- Cloud-hosted MLflow registry
- Kubernetes orchestration and autoscaling
- Continuous Monitoring
-
Clone the repository
-
Create environment variables: Create a
.envfile in the root directory with your Hugging Face read access token.HF_READ_TOKEN=your_token -
Build and run containers: Use Docker Compose to start the containers
docker compose up --build
-
Access the application:
- API:
http://localhost:8000/docswith interactive Swagger UI for testing endpoints, andhttp://localhost:8000/predictfor direct POST requests. - Gradio UI:
http://localhost:7860
- API:
Instead, if you want to run the notebooks, you can use Google Colab for an easy setup, or run them locally with the provided requirements.txt for dependencies (note that this file reflect the MLOps pipeline dependencies, so it includes more packages than necessary for running the notebooks). More specifically, the notebooks are configured to download the trained model weights directly from Google Drive.
Furthermore, the notebooks/demos/ folder contains lightweight versions of the main tasks, designed for quick execution and demonstration purposes. These notebooks use the test subset of the data and pre-trained model weights to present the core functionalities for inference, without the need for training.



