Skip to content

Sreejit-Sengupto/BioInsight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AI-Powered Clinical Decision Support System

An AI-powered clinical decision support system that analyzes patient medical reports (PDFs) and provides preliminary diagnostic insights using a Retrieval-Augmented Generation (RAG) architecture.

Disclaimer: This tool is designed to augment the expertise of medical professionals, not to replace it. All AI-generated analysis should be reviewed by a qualified doctor.

What It Does

Given a patient's blood test report (PDF), the system:

  1. Extracts clinical data from the PDF using Gemini
  2. Searches a vector database of similar patient cases (Qdrant)
  3. Augments findings with real-time web search results
  4. Generates a comprehensive report via Llama 3.1, including:
    • Potential diseases or health concerns
    • Recommended precautions and lifestyle changes
    • FDA-approved medication suggestions
    • Actionable insights for doctors
    • Data points that informed the conclusion

Supported Conditions

Diabetes, kidney stones, heart disease, and anemia.

Project Structure

Healthcare-Assistant/
├── .env                          # API keys and Qdrant config
├── requirements.txt
├── src/
│   ├── config/
│   │   └── settings.yaml         # Qdrant and embedding model settings
│   ├── pre-processing/
│   │   └── preprocess.py         # CSV -> JSONL document conversion
│   └── embeddings/
│       ├── create_embeddings.py  # Embed documents and store in Qdrant
│       └── get_summary.py        # Query Qdrant + generate analysis
└── data/
    ├── raw/                      # Input CSVs and patient report PDFs
    └── processed/                # Generated JSONL files

Getting Started

Prerequisites

  • Python 3.8+
  • A Hugging Face API token
  • A Google AI API key (for Gemini)
  • Qdrant — either:
    • Cloud: A Qdrant Cloud cluster (URL + API key)
    • Docker: A local Qdrant instance running on port 6333

Installation

git clone https://github.com/Sreejit-Sengupto/Healthcare-Assistant.git
cd Healthcare-Assistant
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Environment Variables

Create a .env file in the root directory:

# Qdrant - set QDRANT_MODE to "cloud" or "docker"
QDRANT_MODE=cloud
QDRANT_API_KEY=your_qdrant_api_key
QDRANT_URL=https://your-cluster-id.aws.cloud.qdrant.io

# LLM / Embedding APIs
GOOGLE_API_KEY=your_google_api_key
HUGGINGFACEHUB_API_TOKEN=your_huggingface_token
Variable Description
QDRANT_MODE cloud to use Qdrant Cloud, docker for local Docker (default: docker)
QDRANT_URL Your Qdrant Cloud cluster URL (only needed for cloud mode)
QDRANT_API_KEY Your Qdrant Cloud API key (only needed for cloud mode)
GOOGLE_API_KEY Google AI API key for Gemini (text extraction)
HUGGINGFACEHUB_API_TOKEN Hugging Face token for Llama 3.1 (report generation)

When using Docker mode, Qdrant reads host and port from src/config/settings.yaml (defaults to localhost:6333).

Data Setup

The data/ directory is git-ignored. Create it and add the raw CSV datasets:

mkdir -p data/raw

Place these files in data/raw/:

File Dataset
diabetes_classification.csv Diabetes Classification
kidney_stone_dataset.csv Kidney Stone Prediction
heart_disease.csv Heart Disease Prediction
anemia.csv Anemia Prediction

Usage

Run these three steps in order:

1. Preprocess the data

Converts raw CSVs into a combined JSONL document file.

python src/pre-processing/preprocess.py

Note: This appends to data/processed/combined_health_documents.jsonl. Delete the file first if you want a clean run.

2. Create embeddings

Embeds the documents and stores them in Qdrant.

python src/embeddings/create_embeddings.py
  • Uses sentence-transformers/all-MiniLM-L6-v2 (384-dim, cosine similarity)
  • Batch size: 100 (cloud) / 1000 (docker)
  • Collection name: med_embeddings (configurable in settings.yaml)

3. Run the analysis

Place a patient report PDF in data/raw/, update the filename in get_summary.py if needed, then run:

python src/embeddings/get_summary.py

The system will output the full analysis to the console.

Tech Stack

Component Technology
Vector Database Qdrant (Cloud or Docker)
Embeddings sentence-transformers/all-MiniLM-L6-v2
Text Extraction Gemini 2.5 Flash
Report Generation Llama 3.1 8B Instruct
Orchestration LangChain
Web Search DuckDuckGo

About

This project is an AI-powered clinical decision support system that assists healthcare professionals by analyzing patient reports and providing preliminary diagnostic insights. It leverages a Retrieval-Augmented Generation (RAG) architecture to deliver comprehensive and context-aware analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages