An AI-powered clinical decision support system that analyzes patient medical reports (PDFs) and provides preliminary diagnostic insights using a Retrieval-Augmented Generation (RAG) architecture.
Disclaimer: This tool is designed to augment the expertise of medical professionals, not to replace it. All AI-generated analysis should be reviewed by a qualified doctor.
Given a patient's blood test report (PDF), the system:
- Extracts clinical data from the PDF using Gemini
- Searches a vector database of similar patient cases (Qdrant)
- Augments findings with real-time web search results
- Generates a comprehensive report via Llama 3.1, including:
- Potential diseases or health concerns
- Recommended precautions and lifestyle changes
- FDA-approved medication suggestions
- Actionable insights for doctors
- Data points that informed the conclusion
Diabetes, kidney stones, heart disease, and anemia.
Healthcare-Assistant/
├── .env # API keys and Qdrant config
├── requirements.txt
├── src/
│ ├── config/
│ │ └── settings.yaml # Qdrant and embedding model settings
│ ├── pre-processing/
│ │ └── preprocess.py # CSV -> JSONL document conversion
│ └── embeddings/
│ ├── create_embeddings.py # Embed documents and store in Qdrant
│ └── get_summary.py # Query Qdrant + generate analysis
└── data/
├── raw/ # Input CSVs and patient report PDFs
└── processed/ # Generated JSONL files
- Python 3.8+
- A Hugging Face API token
- A Google AI API key (for Gemini)
- Qdrant — either:
- Cloud: A Qdrant Cloud cluster (URL + API key)
- Docker: A local Qdrant instance running on port 6333
git clone https://github.com/Sreejit-Sengupto/Healthcare-Assistant.git
cd Healthcare-Assistant
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtCreate a .env file in the root directory:
# Qdrant - set QDRANT_MODE to "cloud" or "docker"
QDRANT_MODE=cloud
QDRANT_API_KEY=your_qdrant_api_key
QDRANT_URL=https://your-cluster-id.aws.cloud.qdrant.io
# LLM / Embedding APIs
GOOGLE_API_KEY=your_google_api_key
HUGGINGFACEHUB_API_TOKEN=your_huggingface_token| Variable | Description |
|---|---|
QDRANT_MODE |
cloud to use Qdrant Cloud, docker for local Docker (default: docker) |
QDRANT_URL |
Your Qdrant Cloud cluster URL (only needed for cloud mode) |
QDRANT_API_KEY |
Your Qdrant Cloud API key (only needed for cloud mode) |
GOOGLE_API_KEY |
Google AI API key for Gemini (text extraction) |
HUGGINGFACEHUB_API_TOKEN |
Hugging Face token for Llama 3.1 (report generation) |
When using Docker mode, Qdrant reads host and port from src/config/settings.yaml (defaults to localhost:6333).
The data/ directory is git-ignored. Create it and add the raw CSV datasets:
mkdir -p data/rawPlace these files in data/raw/:
| File | Dataset |
|---|---|
diabetes_classification.csv |
Diabetes Classification |
kidney_stone_dataset.csv |
Kidney Stone Prediction |
heart_disease.csv |
Heart Disease Prediction |
anemia.csv |
Anemia Prediction |
Run these three steps in order:
Converts raw CSVs into a combined JSONL document file.
python src/pre-processing/preprocess.pyNote: This appends to
data/processed/combined_health_documents.jsonl. Delete the file first if you want a clean run.
Embeds the documents and stores them in Qdrant.
python src/embeddings/create_embeddings.py- Uses
sentence-transformers/all-MiniLM-L6-v2(384-dim, cosine similarity) - Batch size: 100 (cloud) / 1000 (docker)
- Collection name:
med_embeddings(configurable insettings.yaml)
Place a patient report PDF in data/raw/, update the filename in get_summary.py if needed, then run:
python src/embeddings/get_summary.pyThe system will output the full analysis to the console.
| Component | Technology |
|---|---|
| Vector Database | Qdrant (Cloud or Docker) |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| Text Extraction | Gemini 2.5 Flash |
| Report Generation | Llama 3.1 8B Instruct |
| Orchestration | LangChain |
| Web Search | DuckDuckGo |