Skip to content

MediTalk - Research framework for giving Meditron medical AI natural conversational speech capabilities.

Notifications You must be signed in to change notification settings

EPFLiGHT/MediTalk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MediTalk - Medical AI with Voice

Python 3.10+ Research

Medical conversational AI system combining MultiMeditron LLM with speech capabilities for voice-based medical interactions.

Overview

MediTalk integrates multiple AI services to enable natural voice conversations with medical language models:

  • Medical LLM: MultiMeditron for medical question answering
  • Speech Recognition: Whisper for audio transcription
  • Speech Synthesis: Multiple TTS models (Orpheus, Bark, CSM, Qwen3-Omni)
  • Web Interface: Streamlit-based conversation UI
  • Benchmarking: Comprehensive evaluation suite for TTS and ASR models

Prerequisites

Quick Start

1. Configure environment:

Create .env file:

HUGGINGFACE_TOKEN=your_token
MULTIMEDITRON_HF_TOKEN=your_token
MULTIMEDITRON_MODEL=ClosedMeditron/Mulimeditron-End2End-CLIP-medical

2. Setup (first time only):

./scripts/setup-local.sh

3. Start services:

./scripts/start-local.sh

4. Access interface:

Open http://localhost:8503 in your browser.

5. Stop services:

./scripts/stop-local.sh

Services

MediTalk consists of multiple microservices. Each service has its own README with detailed setup instructions for individual API usage.

Service Port Description README
Controller 8000 Orchestrates all services Link
WebUI 8501 Streamlit interface Link
MultiMeditron 5003 Medical LLM Link
Whisper 8000 Speech-to-text Link
Orpheus 5005 Neural TTS Link
Bark 5008 Multilingual TTS Link
CSM 5004 Conversational TTS Link
Qwen3-Omni 5006 Multimodal TTS Link
NISQA 8006 Speech quality assessment Link

Benchmarking

MediTalk includes comprehensive benchmarking suites for evaluating model performance.

TTS Benchmark

Evaluate text-to-speech models on intelligibility, quality, and speed.

cd benchmark/tts
./run_benchmark.sh

See benchmark/tts/README.md for details.

Whisper Benchmark

Evaluate speech recognition accuracy across different Whisper model sizes.

cd benchmark/whisper
./run_benchmark.sh

See benchmark/whisper/README.md for details.

Project Structure

MediTalk/
│
├── services/                 # Microservices
│   ├── controller/           # Service orchestration
│   ├── webui/                # Web interface
│   ├── modelMultiMeditron/   # Medical LLM
│   ├── modelWhisper/         # ASR
│   ├── modelOrpheus/         # TTS
│   ├── modelBark/            # TTS
│   ├── modelCSM/             # TTS (conversational)
│   ├── modelQwen3Omni/       # TTS (conversational)
│   └── modelNisqa/           # Quality assessment (MOS)
│
├── benchmark/                # Evaluation suites
│   ├── tts/                  # TTS benchmark
│   └── whisper/              # ASR benchmark
│
├── scripts/                  # Management scripts
│
├── data/                     # Datasets (Download, Storage, Preprocessing) 
│
├── inputs/                   # Input files
│
├── outputs/                  # Generated files
│
└── logs/                     # Service logs

Monitoring

Check service health:

./scripts/health-check.sh

View logs:

tail -f logs/controller.log
tail -f logs/modelOrpheus.log

Monitor GPU usage:

./scripts/monitor-gpus.sh

Troubleshooting

Service won't start:

tail -f logs/<service>.log

Check for errors, missing dependencies, or invalid tokens.

Missing ffmpeg:

sudo apt-get update && sudo apt-get install -y ffmpeg
./scripts/restart.sh

Model loading fails:

  • Verify HuggingFace token in .env
  • Check disk space (models are large)
  • Review service logs in logs/ directory

Note: First run may take several minutes as models are downloaded and cached.

EPFL RCP Cluster Deployment

For deployment on EPFL RCP cluster, refer to LiGHT RCP Documentation.

Acknowledgments


Semester Project | Nicolas Teissier | LiGHT Laboratory | EPFL

About

MediTalk - Research framework for giving Meditron medical AI natural conversational speech capabilities.

Resources

Stars

Watchers

Forks

Packages

No packages published