Chart Search AI Module

An OpenMRS module that lets clinicians ask natural language questions about a patient's chart and get answers with source citations.

For project background, community discussion, and roadmap, see the wiki project page.

Requirements

Java 11+
OpenMRS Platform 2.8.0+
Webservices REST module 2.44.0+
10GB+ RAM recommended (for LLM inference with the default 8B model)
Elasticsearch 8.14+ (optional, for the hybrid retrieval pipeline; the default embedding and Lucene pipelines require no external services)

Setup

1. Build

mvn package

The .omod file is in omod/target/.

2. Download the LLM model

Download Llama 3.3 8B (Q4_K_M quantization) in GGUF format (~5GB) from Hugging Face.

Place the .gguf file inside the OpenMRS application data directory (e.g., <openmrs-application-data-directory>/chartsearchai/). Model paths are resolved relative to this directory for security.

Available models:

Model	RAM Needed	Chat Template	Download
Llama 3.2 3B	~6GB total	`llama3`	GGUF
MedGemma 4B	~5GB total	`gemma`	GGUF
Llama 3.3 8B (default)	~10GB total	`llama3`	GGUF
Mistral Nemo 12B	~12GB total	`mistral`	GGUF

MedGemma is a medical-domain fine-tune of Gemma 3 by Google, trained on clinical text comprehension — it may produce more accurate clinical answers than general-purpose models of similar size. Larger models produce more accurate answers with better instruction following. Smaller models use less RAM but may produce lower quality responses. To switch models, update chartsearchai.llm.modelFilePath and chartsearchai.llm.chatTemplate — no rebuild needed.

3. Download the embedding model

If embedding pre-filtering is enabled (default), download the all-MiniLM-L6-v2 ONNX model (~90MB) from Hugging Face. You need both model.onnx and vocab.txt from the repository.

Place them alongside the LLM model (e.g., <openmrs-application-data-directory>/chartsearchai/).

4. Install

Copy the .omod file into the modules folder of the OpenMRS application data directory (e.g., <openmrs-application-data-directory>/modules/). The module will be loaded on the next OpenMRS startup.

5. Configure

Set these global properties in Admin > Settings:

Required

Property	Description
`chartsearchai.llm.modelFilePath`	Relative path (within the OpenMRS application data directory) to the `.gguf` model file, e.g. `chartsearchai/Llama-3.3-8B-Instruct-Q4_K_M.gguf`

Retrieval pipeline

Property	Default	Description
`chartsearchai.embedding.preFilter`	`true`	When `true`, uses the selected retrieval pipeline to narrow patient records to the most relevant ones before sending to the LLM. Set to `false` to send the full chart instead
`chartsearchai.retrieval.pipeline`	`embedding`	Selects the retrieval pipeline: `embedding` (default) uses vector similarity via an ONNX model with custom scoring; `lucene` uses Apache Lucene BM25 text search; `elasticsearch` uses Elasticsearch hybrid search combining BM25 text and kNN vector search via Reciprocal Rank Fusion (requires Elasticsearch 8.14+ configured in OpenMRS). All require `preFilter` to be `true`. Records are indexed automatically on first access. Changing this setting takes effect on the next query

Embedding pipeline tuning

These settings only apply when chartsearchai.retrieval.pipeline is embedding (the default). They have no effect on the Lucene or Elasticsearch pipelines.

Property	Default	Description
`chartsearchai.embedding.topK`	`10`	Maximum number of records sent to the LLM per query. When the query mentions a specific clinical type (e.g., "medications", "allergies", "lab results"), all records of that type are included regardless of topK, and remaining slots are filled with contextual records from other types. For queries without a detected type, topK is the hard cap. Type detection uses keyword matching — for example, "medications" and "drugs" both match drug orders, while "blood pressure" and "bp" both match observations
`chartsearchai.embedding.similarityRatio`	`0.80`	Minimum similarity score as a fraction of the top result's score. Records scoring below this ratio are excluded even if within the topK limit. Must be between 0 and 1
`chartsearchai.embedding.scoreGapMultiplier`	`2.5`	Controls adaptive topK by detecting natural cluster boundaries in similarity scores. Higher values include more records; lower values cut more aggressively. Set to a very large value (e.g. 999) to disable gap detection
`chartsearchai.embedding.minScoreGap`	`0.10`	Minimum absolute gap between consecutive similarity scores required for the adaptive cutoff detector to trigger. Prevents premature cutting when a relatively large gap (compared to a tight cluster's running average) is still small in absolute terms. Only applies when gap detection is active
`chartsearchai.embedding.keywordWeight`	`0.3`	Additive keyword bonus weight in the hybrid retrieval formula: `finalScore = semanticScore + weight × keywordScore`. Keyword overlap can only increase the score, never decrease it. Set to `0` to disable keyword matching
`chartsearchai.embedding.typeBoostFactor`	`1.0`	Score multiplier applied to records whose resource type matches the query intent (e.g., drug orders when the query is about medications). Set to `1.0` to disable type boosting (default). Values like `1.2`–`1.5` provide moderate boosting. Must be between 1.0 and 3.0
`chartsearchai.embedding.queryPrefix`	(empty)	Prefix prepended to the user query before embedding. Leave empty for models like all-MiniLM-L6-v2 that were not trained with instruction prefixes. Set to `search_query:` or `Represent this sentence for searching relevant passages:` for models that support instruction-aware queries (e.g., BGE)
`chartsearchai.embedding.maxSequenceLength`	`256`	Maximum WordPiece token sequence length for embedding input. Increase when using models that support longer contexts (e.g., 512 for BGE models). Must be between 32 and 8192
`chartsearchai.embedding.modelFilePath`	—	Required when using the embedding or elasticsearch pipeline. Relative path to the ONNX model file (all-MiniLM-L6-v2), e.g. `chartsearchai/all-MiniLM-L6-v2.onnx`. Not needed for the Lucene pipeline
`chartsearchai.embedding.vocabFilePath`	—	Required when using the embedding or elasticsearch pipeline. Relative path to the WordPiece `vocab.txt` file, e.g. `chartsearchai/vocab.txt`. Not needed for the Lucene pipeline

LLM tuning

Property	Default	Description
`chartsearchai.llm.chatTemplate`	`llama3`	Chat template for formatting prompts. Presets: `llama3`, `mistral`, `phi3`, `chatml`, `gemma`. Set to `auto` to use the model's built-in GGUF chat template. Or a custom template string with `{system}` and `{user}` placeholders
`chartsearchai.llm.systemPrompt`	(built-in clinical prompt)	System prompt that guides how the LLM responds — e.g. answering only the question asked, using only the provided patient records, citing records by number, naming what is missing when records lack relevant information (e.g. "There are no records about diabetes in this patient's chart"), keeping answers concise, and returning structured JSON
`chartsearchai.llm.timeoutSeconds`	`120`	Maximum seconds to wait for LLM inference before timing out

Rate limiting and caching

Property	Default	Description
`chartsearchai.rateLimitPerMinute`	`10`	Maximum queries per user per minute. Set to `0` to disable
`chartsearchai.cacheTtlMinutes`	`0`	Minutes to cache identical (patient, question) answers. Set to `0` to disable (default)

Audit

Property	Default	Description
`chartsearchai.auditLogRetentionDays`	`90`	Audit log entries older than this are purged daily. Set to `0` to retain all

6. Grant privileges

Privilege	Purpose
AI Query Patient Data	Execute chart search queries
View AI Audit Logs	Access the audit log endpoint

7. Indexing

When chartsearchai.embedding.preFilter is true (default), patient records are automatically indexed on first chart access for whichever retrieval pipeline is active. Subsequent data changes trigger automatic re-indexing via AOP hooks on encounter, obs, condition, diagnosis, allergy, order, program enrollment, medication dispense, and patient merge operations.

Embedding pipeline (default): Uses an ONNX embedding model for vector similarity search. A bulk backfill task ("Chart Search AI - Embedding Backfill") is available in Admin > Scheduler > Manage Scheduler to pre-index all patients. The default model is all-MiniLM-L6-v2 (general-purpose, 384 dimensions). Any BERT-based ONNX embedding model can be used as a drop-in replacement by updating chartsearchai.embedding.modelFilePath and chartsearchai.embedding.vocabFilePath. Embedding dimensions are auto-detected from the model output, so models with any dimension size work without code changes. After switching models, existing embeddings are incompatible — run the backfill task to re-index all patients with the new model.

Lucene pipeline (chartsearchai.retrieval.pipeline=lucene): Uses Apache Lucene BM25 text search with English stemming. No ONNX model files are required. The Lucene index is stored at <openmrs-application-data-directory>/chartsearchai/lucene-index/ and is built automatically on first patient access. This pipeline is simpler to set up (no model download needed) and may be preferred for environments where the ONNX model is unavailable.

Elasticsearch pipeline (chartsearchai.retrieval.pipeline=elasticsearch): Uses Elasticsearch hybrid search combining BM25 text search with kNN dense vector search via Reciprocal Rank Fusion (RRF). Requires Elasticsearch 8.14+ configured in OpenMRS (set hibernate.search.backend.uris in runtime properties). Also requires the ONNX embedding model (same as the embedding pipeline) to compute vectors for the kNN side of the hybrid search. Patient records are indexed into a shared chartsearchai-patient-records Elasticsearch index with both text and embedding vector fields. The RRF algorithm fuses rankings from both signals — this means queries like "any cancer?" can find semantic matches (e.g. Kaposi sarcoma) via kNN even when the literal term is absent from the records, while also benefiting from BM25's lexical matching. If Elasticsearch is not available at query time, the pipeline automatically falls back to the embedding pipeline. After switching embedding models, delete the chartsearchai-patient-records index from Elasticsearch — it will be recreated with the new model's dimensions on the next patient access.

Choosing a pipeline:

Consideration	Embedding (default)	Lucene	Elasticsearch
External dependencies	ONNX model files only	None	Elasticsearch 8.14+ cluster + ONNX model files
Semantic matching (e.g., "cancer" finds "Kaposi sarcoma")	Yes	No	Yes (via kNN)
Absent-data detection (returns "no records about X" instead of false positives)	Yes (z-score gate)	No	No
Type-aware auto-expand (e.g., "any conditions?" returns all conditions)	Yes	No	No
Adaptive result filtering (gap detection, similarity ratio)	Yes	No	No
Keyword matching	Yes (hybrid scoring)	Yes (BM25 with stemming)	Yes (BM25 + kNN via RRF)
Tunable parameters	Many (topK, similarityRatio, scoreGapMultiplier, keywordWeight, etc.)	Few (topK only)	Few (topK only; scoring delegated to Elasticsearch)
Compute location	In-process (JVM)	In-process (JVM)	Elasticsearch cluster
Graceful fallback	N/A (default)	Falls back to full chart on error	Falls back to embedding pipeline

The embedding pipeline is recommended for most deployments — it runs entirely in-process, has the most sophisticated filtering (z-score gate for absent-data detection, gap detection for adaptive result cutoff, type-aware expansion), and requires no external services. The Lucene pipeline is the simplest option when the ONNX model is unavailable, but lacks semantic understanding. The Elasticsearch pipeline is best when you already have an ES cluster in your infrastructure and want to offload retrieval compute, but it lacks the embedding pipeline's absent-data detection and adaptive filtering — RRF always returns results from at least the kNN side, even when the patient has no relevant records.

Testing the Elasticsearch pipeline locally

To test the Elasticsearch pipeline with the OpenMRS SDK:

1. Start Elasticsearch 8.14+ with Docker:

docker run -d --name elasticsearch \
  -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  elasticsearch:8.17.2

Verify it's running: curl http://localhost:9200/_cluster/health

2. Configure OpenMRS to use Elasticsearch:

Add to your OpenMRS runtime properties file (e.g., ~/openmrs/openmrs-runtime.properties):

hibernate.search.backend.type=elasticsearch
hibernate.search.backend.uris=http://localhost:9200

Or if using the SDK with Docker, pass the environment variable when running the server:

OMRS_SEARCH=elasticsearch mvn openmrs-sdk:run

3. Set the retrieval pipeline:

In Admin > Settings, set:

Property	Value
`chartsearchai.retrieval.pipeline`	`elasticsearch`

Also ensure the ONNX embedding model and vocab files are configured (same as the default embedding pipeline).

4. Query a patient — records are indexed automatically on first access. To verify indexing, check the ES index:

curl http://localhost:9200/chartsearchai-patient-records/_count

5. To reset and re-index, delete the ES index:

curl -X DELETE http://localhost:9200/chartsearchai-patient-records

Records will be re-indexed on the next patient access.

Query behavior

Absent-data detection

When the embedding pipeline is active and a query has no keyword matches in the patient's records (e.g., asking "any cancer?" for a patient with no cancer-related records), the system uses a z-score gate to detect whether the top semantic match is a genuine result or just noise. If the patient has 30+ records and the best semantic score is not a statistical outlier (z-score < 2.0), the query returns "There are no records about [topic] in this patient's chart" instead of false positives. This prevents the system from returning unrelated records that happen to have slightly elevated similarity scores.

Recency cap

Questions with numeric recency constraints are automatically detected and honored. For example, "last 3 blood pressure readings" or "most recent 5 lab results" will cap the results per concept group to the specified number, keeping only the most recent measurements. This applies across all retrieval pipelines.

Input validation

Questions are checked against common prompt injection patterns (e.g., "ignore previous instructions", "you are now", "system prompt:") and rejected with HTTP 400 if matched. This is a defense-in-depth measure — the primary protection is the GBNF grammar that constrains LLM output to a fixed JSON structure regardless of prompt content. Normal clinical questions containing words like "ignore" or "instructions" in non-adversarial contexts (e.g., "What instructions were given at discharge?") are not affected.

API

Search

POST /ws/rest/v1/chartsearchai/search
Content-Type: application/json

{
  "patient": "patient-uuid-here",
  "question": "What medications is this patient on?"
}

Response:

{
  "answer": "The patient is currently on Metformin [1] and Lisinopril [3]...",
  "disclaimer": "This response is AI-generated and may not be accurate...",
  "references": [
    { "index": 3, "resourceType": "order", "resourceId": 789, "date": "2025-03-15" },
    { "index": 1, "resourceType": "order", "resourceId": 456, "date": "2025-01-10" }
  ]
}

Streaming search (SSE)

For real-time token-by-token streaming:

POST /ws/rest/v1/chartsearchai/search/stream
Content-Type: application/json
Accept: text/event-stream

{
  "patient": "patient-uuid-here",
  "question": "What medications is this patient on?"
}

SSE events:

Event	Description
`token`	A chunk of the answer text as it is generated
`done`	Final JSON with the complete answer, references (sorted most recent first, with `index`, `resourceType`, `resourceId`, `date`), and disclaimer
`error`	Error message if something goes wrong

Audit log

Requires the "View AI Audit Logs" privilege.

GET /ws/rest/v1/chartsearchai/auditlog?patient=...&user=...&fromDate=...&toDate=...&startIndex=0&limit=50

All query parameters are optional. fromDate and toDate are epoch milliseconds. Returns paginated results ordered by most recent first, with a totalCount for pagination.

Patient access control

By default, any user with the "AI Query Patient Data" privilege can query any patient. To add patient-level restrictions (e.g., location-based or care-team-based), provide a custom Spring bean that implements the PatientAccessCheck interface:

<bean id="chartSearchAi.patientAccessCheck"
      class="com.example.LocationBasedPatientAccessCheck"/>

This overrides the default permissive implementation.

Evals

The project includes an eval framework that tests retrieval quality, citation accuracy, absent-data detection, and prompt injection resistance without requiring a running LLM or external services.

Running evals

mvn test -pl api -Dtest="*EvalTest"

Or run a specific suite:

mvn test -pl api -Dtest="RetrievalQualityEvalTest"
mvn test -pl api -Dtest="CitationEvalTest"
mvn test -pl api -Dtest="AbsentDataEvalTest"
mvn test -pl api -Dtest="PromptInjectionEvalTest"

Adding cases

Each suite is driven by a JSON dataset in api/src/test/resources/eval/. To add a case, append an entry to the relevant file:

File	What it tests
`retrieval-eval-dataset.json`	Query → expected record indices (recall@30)
`citation-eval-dataset.json`	Simulated LLM JSON → expected citation indices (F1)
`absent-data-eval-dataset.json`	Query → expected keywords in "no records" answer
`prompt-injection-eval-dataset.json`	Adversarial payload → special tokens stripped

Metrics report

Each run appends per-case and summary metrics to api/target/eval-results.csv for tracking regressions over time.

Architecture

See docs/adr.md for architectural decisions and design rationale.

License

This project is licensed under the MPL 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 310 Commits
.github/workflows		.github/workflows
api		api
docs		docs
omod		omod
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation

Chart Search AI Module

Requirements

Setup

1. Build

2. Download the LLM model

3. Download the embedding model

4. Install

5. Configure

Required

Retrieval pipeline

Embedding pipeline tuning

LLM tuning

Rate limiting and caching

Audit

6. Grant privileges

7. Indexing

Testing the Elasticsearch pipeline locally

Query behavior

Absent-data detection

Recency cap

Input validation

API

Search

Streaming search (SSE)

Audit log

Patient access control

Evals

Running evals

Adding cases

Metrics report

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages