AI-Powered Clinical Decision Support System

An AI-powered clinical decision support system that analyzes patient medical reports (PDFs) and provides preliminary diagnostic insights using a Retrieval-Augmented Generation (RAG) architecture.

Disclaimer: This tool is designed to augment the expertise of medical professionals, not to replace it. All AI-generated analysis should be reviewed by a qualified doctor.

What It Does

Given a patient's blood test report (PDF), the system:

Extracts clinical data from the PDF using Gemini
Searches a vector database of similar patient cases (Qdrant)
Augments findings with real-time web search results
Generates a comprehensive report via Llama 3.1, including:
- Potential diseases or health concerns
- Recommended precautions and lifestyle changes
- FDA-approved medication suggestions
- Actionable insights for doctors
- Data points that informed the conclusion

Supported Conditions

Diabetes, kidney stones, heart disease, and anemia.

Project Structure

Healthcare-Assistant/
├── .env                          # API keys and Qdrant config
├── requirements.txt
├── src/
│   ├── config/
│   │   └── settings.yaml         # Qdrant and embedding model settings
│   ├── pre-processing/
│   │   └── preprocess.py         # CSV -> JSONL document conversion
│   └── embeddings/
│       ├── create_embeddings.py  # Embed documents and store in Qdrant
│       └── get_summary.py        # Query Qdrant + generate analysis
└── data/
    ├── raw/                      # Input CSVs and patient report PDFs
    └── processed/                # Generated JSONL files

Getting Started

Prerequisites

Python 3.8+
A Hugging Face API token
A Google AI API key (for Gemini)
Qdrant — either:
- Cloud: A Qdrant Cloud cluster (URL + API key)
- Docker: A local Qdrant instance running on port 6333

Installation

git clone https://github.com/Sreejit-Sengupto/Healthcare-Assistant.git
cd Healthcare-Assistant
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Environment Variables

Create a .env file in the root directory:

# Qdrant - set QDRANT_MODE to "cloud" or "docker"
QDRANT_MODE=cloud
QDRANT_API_KEY=your_qdrant_api_key
QDRANT_URL=https://your-cluster-id.aws.cloud.qdrant.io

# LLM / Embedding APIs
GOOGLE_API_KEY=your_google_api_key
HUGGINGFACEHUB_API_TOKEN=your_huggingface_token

Variable	Description
`QDRANT_MODE`	`cloud` to use Qdrant Cloud, `docker` for local Docker (default: `docker`)
`QDRANT_URL`	Your Qdrant Cloud cluster URL (only needed for cloud mode)
`QDRANT_API_KEY`	Your Qdrant Cloud API key (only needed for cloud mode)
`GOOGLE_API_KEY`	Google AI API key for Gemini (text extraction)
`HUGGINGFACEHUB_API_TOKEN`	Hugging Face token for Llama 3.1 (report generation)

When using Docker mode, Qdrant reads host and port from src/config/settings.yaml (defaults to localhost:6333).

Data Setup

The data/ directory is git-ignored. Create it and add the raw CSV datasets:

mkdir -p data/raw

Place these files in data/raw/:

File	Dataset
`diabetes_classification.csv`	Diabetes Classification
`kidney_stone_dataset.csv`	Kidney Stone Prediction
`heart_disease.csv`	Heart Disease Prediction
`anemia.csv`	Anemia Prediction

Usage

Run these three steps in order:

1. Preprocess the data

Converts raw CSVs into a combined JSONL document file.

python src/pre-processing/preprocess.py

Note: This appends to data/processed/combined_health_documents.jsonl. Delete the file first if you want a clean run.

2. Create embeddings

Embeds the documents and stores them in Qdrant.

python src/embeddings/create_embeddings.py

Uses sentence-transformers/all-MiniLM-L6-v2 (384-dim, cosine similarity)
Batch size: 100 (cloud) / 1000 (docker)
Collection name: med_embeddings (configurable in settings.yaml)

3. Run the analysis

Place a patient report PDF in data/raw/, update the filename in get_summary.py if needed, then run:

python src/embeddings/get_summary.py

The system will output the full analysis to the console.

Tech Stack

Component	Technology
Vector Database	Qdrant (Cloud or Docker)
Embeddings	sentence-transformers/all-MiniLM-L6-v2
Text Extraction	Gemini 2.5 Flash
Report Generation	Llama 3.1 8B Instruct
Orchestration	LangChain
Web Search	DuckDuckGo

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Clinical Decision Support System

What It Does

Supported Conditions

Project Structure

Getting Started

Prerequisites

Installation

Environment Variables

Data Setup

Usage

1. Preprocess the data

2. Create embeddings

3. Run the analysis

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Clinical Decision Support System

What It Does

Supported Conditions

Project Structure

Getting Started

Prerequisites

Installation

Environment Variables

Data Setup

Usage

1. Preprocess the data

2. Create embeddings

3. Run the analysis

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages