AI Imaging Agent — RAG + VLM Tool Picker

A tiny “AI-assisted search” that helps users find the right imaging software for their image and task.
Users drop an image and describe what they want (e.g., “segment the lungs”). The app:

Retrieves candidate tools from a local catalog (text-only query + format token).
Selects the best tool with a single VLM call (text + image + candidates + original image metadata).
Returns a link to the tool’s public runnable demo (Hugging Face Space, notebook viewer, etc.).
(We don’t run the tool or upload user data to third-party endpoints.)

What’s in here

Retrieval: FAISS/BGE-M3 + Cross-Encoder reranker.
Single-shot selection: OpenAI VLM (gpt-4o/gpt-4o-mini by default).
Image metadata awareness: The original file extension & shape are passed to the VLM (even if a .tif/.nii.gz is rasterized to PNG for the preview), so IO compatibility matters in the choice.
Gradio UI: One textbox + one file input → result = selected software + demo link.
Logging: Console + rotating file logs; optional prompt snapshots to logs/.

⚠️ Medical disclaimer: This app is a software recommender, not a diagnostic tool.

Quickstart

1) Requirements

Python 3.10–3.12
A working internet connection (for model calls)
An OpenAI API key

git clone <your-repo>
cd ai-agent
python -m venv env
# Windows
env\Scripts\activate
# macOS/Linux
source env/bin/activate

# Install with pip using pyproject.toml
pip install --upgrade pip
pip install .

# For development (includes test dependencies)
pip install -e ".[dev]"

2) Configure `.env`

Create a .env file at repo root:

OPENAI_API_KEY=sk-xxxx
# Optional model overrides (defaults work):
OPENAI_MODEL=gpt-4o

# Software catalog
SOFTWARE_CATALOG=path/to/your/catalog.jsonl

# Pipeline configuration
TOP_K=8                # Number of candidates to retrieve
NUM_CHOICES=3          # Number of tools to recommend

# Logging configuration
LOGLEVEL_CONSOLE=WARNING
LOGLEVEL_FILE=INFO
FILE_LOG=1
LOG_DIR=logs
LOG_PROMPTS=0         # write selector prompt snapshots

3) Run the app

ai_agent ui

Open http://127.0.0.1:7860 and try:

“I want to segment the lungs from this CT scan image” + a .tif lung volume slice (or any image).

Catalog format

The catalog is JSON or JSONL. Each line/object is a SoftwareDoc. Minimal fields:

{
  "name": "3d-lungs-segmentation",
  "description": "3D lung segmentation from CT; returns a mask/overlay.",

  "applicationCategory": [],
  "featureList": ["segmentation"],
  "imagingModality": ["CT"],
  "dims": [3],
  "anatomy": ["lung"],
  "keywords": ["mask", "overlay", "lung segmentation", "CT"],

  "programmingLanguage": "Python",
  "requiresGPU": false,
  "isAccessibleForFree": true,
  "license": "Apache-2.0",

  "supportingData": [
    {
      "datasetFormat": "TIFF",
      "bodySite": "lung",
      "imagingModality": "CT",
      "hasDimensionality": 3
    },
    {
      "datasetFormat": "TIF",
      "bodySite": "lung",
      "imagingModality": "CT",
      "hasDimensionality": 3
    }
  ],

  "runnableExample": [
    {
      "hostType": "gradio",
      "url": "https://huggingface.co/spaces/qchapp/3d-lungs-segmentation",
      "name": "HF Space"
    }
  ]
}

You can add multiple runnable types (e.g., "type": "notebook", "type": "webapp", "type": "jvm"); the pipeline just picks the best base URL to show to the user.

How the pipeline works

Retrieval (fast, no LLM)

Build a text query from the user prompt
If user uploaded a file, add a format token (e.g., format:TIF or format:NII.GZ)
Embed with BGE-M3, rerank with Cross-Encoder
Return top-K candidates (configurable via TOP_K)

Selection (one VLM call)

Call the VLM with:
- Text: user request + compact table of top-K candidates
- Image: a PNG preview (safe for the API)
- Metadata: original file info (name, extension, shape, etc.)

VLM responds with strict JSON:

{
  "choices": [
    {
      "name": "tool-name",
      "rank": 1,
      "accuracy": 95.5,
      "why": "Best match because..."
    },
    {
      "name": "alternative-tool",
      "rank": 2,
      "accuracy": 82.3,
      "why": "Good alternative..."
    }
  ]
}

Returns up to NUM_CHOICES ranked tools with accuracy scores
UI displays choices with explanation and demo links

Logging & Debugging

Console log level via LOGLEVEL_CONSOLE (default INFO).
File logs in logs/app_YYYYMMDD.log (enable with FILE_LOG=1).
Prompt snapshots (when LOG_PROMPTS=1):
- logs/vlm_selector_YYYYMMDD_HHMMSS.txt — the system/user text the model saw
- logs/vlm_selector_YYYYMMDD_HHMMSS.png — the exact PNG sent to the VLM

Security & Privacy

The app does not upload your image to third-party demos.
It only shows a link to a public demo page.
The only external API call on your content is the single VLM request to OpenAI for tool selection (preview image + brief metadata + text).
Turn off prompt snapshots if you don’t want local copies of previews: LOG_PROMPTS=0.

Project layout

ai_agent/
  api/
    pipeline.py       # RAG pipeline implementation
  generator/
    generator.py      # VLMToolSelector implementation
    prompts.py       # System prompts and templates
    schema.py        # Pydantic models for validation
  retriever/
    embedders.py     # Vector search components
  ui/
    app.py           # Gradio interface
  utils/
    file_validator.py  # File format validation
    image_meta.py     # Metadata extraction
    image_io.py       # Image loading/conversion
    image_analyzer.py # VLM image analysis
    tags.py           # Tags passed to the VLM
    previews.py       # Building previews for user
tests/               # Unit tests
pyproject.toml       # Project configuration and dependencies

Docker deployment

You can find the docker image in tools/image/Dockerfile

Build and run - app starts automatically

docker build -t ai-agent:latest -f tools/image/Dockerfile .
docker run -d --rm -p 7860:7860 ai-agent:latest

With environment variables

docker run -d --rm -p 7860:7860 \
  -e OPENAI_API_KEY="your-key" \
  ai-agent:latest

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.devcontainer		.devcontainer
.github		.github
data		data
scripts		scripts
src/ai_agent		src/ai_agent
tests		tests
tools		tools
.env.dist		.env.dist
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Imaging Agent — RAG + VLM Tool Picker

What’s in here

Quickstart

1) Requirements

2) Configure `.env`

3) Run the app

Catalog format

How the pipeline works

Retrieval (fast, no LLM)

Selection (one VLM call)

Logging & Debugging

Security & Privacy

Project layout

Docker deployment

Build and run - app starts automatically

With environment variables

Development tips

Future improvements

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Imaging Agent — RAG + VLM Tool Picker

What’s in here

Quickstart

1) Requirements

2) Configure .env

3) Run the app

Catalog format

How the pipeline works

Retrieval (fast, no LLM)

Selection (one VLM call)

Logging & Debugging

Security & Privacy

Project layout

Docker deployment

Build and run - app starts automatically

With environment variables

Development tips

Future improvements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2) Configure `.env`

Packages