Medical conversational AI system combining MultiMeditron LLM with speech capabilities for voice-based medical interactions.
MediTalk integrates multiple AI services to enable natural voice conversations with medical language models:
- Medical LLM: MultiMeditron for medical question answering
- Speech Recognition: Whisper for audio transcription
- Speech Synthesis: Multiple TTS models (Orpheus, Bark, CSM, Qwen3-Omni)
- Web Interface: Streamlit-based conversation UI
- Benchmarking: Comprehensive evaluation suite for TTS and ASR models
- HuggingFace token (get one)
- Model access:
- meta-llama/Meta-Llama-3.1-8B-Instruct
- ClosedMeditron/Mulimeditron-End2End-CLIP-medical (request from EPFL LiGHT lab)
- canopylabs/orpheus-3b-0.1-ft
1. Configure environment:
Create .env file:
HUGGINGFACE_TOKEN=your_token
MULTIMEDITRON_HF_TOKEN=your_token
MULTIMEDITRON_MODEL=ClosedMeditron/Mulimeditron-End2End-CLIP-medical2. Setup (first time only):
./scripts/setup-local.sh3. Start services:
./scripts/start-local.sh4. Access interface:
Open http://localhost:8503 in your browser.
5. Stop services:
./scripts/stop-local.shMediTalk consists of multiple microservices. Each service has its own README with detailed setup instructions for individual API usage.
| Service | Port | Description | README |
|---|---|---|---|
| Controller | 8000 | Orchestrates all services | Link |
| WebUI | 8501 | Streamlit interface | Link |
| MultiMeditron | 5003 | Medical LLM | Link |
| Whisper | 8000 | Speech-to-text | Link |
| Orpheus | 5005 | Neural TTS | Link |
| Bark | 5008 | Multilingual TTS | Link |
| CSM | 5004 | Conversational TTS | Link |
| Qwen3-Omni | 5006 | Multimodal TTS | Link |
| NISQA | 8006 | Speech quality assessment | Link |
MediTalk includes comprehensive benchmarking suites for evaluating model performance.
Evaluate text-to-speech models on intelligibility, quality, and speed.
cd benchmark/tts
./run_benchmark.shSee benchmark/tts/README.md for details.
Evaluate speech recognition accuracy across different Whisper model sizes.
cd benchmark/whisper
./run_benchmark.shSee benchmark/whisper/README.md for details.
MediTalk/
│
├── services/ # Microservices
│ ├── controller/ # Service orchestration
│ ├── webui/ # Web interface
│ ├── modelMultiMeditron/ # Medical LLM
│ ├── modelWhisper/ # ASR
│ ├── modelOrpheus/ # TTS
│ ├── modelBark/ # TTS
│ ├── modelCSM/ # TTS (conversational)
│ ├── modelQwen3Omni/ # TTS (conversational)
│ └── modelNisqa/ # Quality assessment (MOS)
│
├── benchmark/ # Evaluation suites
│ ├── tts/ # TTS benchmark
│ └── whisper/ # ASR benchmark
│
├── scripts/ # Management scripts
│
├── data/ # Datasets (Download, Storage, Preprocessing)
│
├── inputs/ # Input files
│
├── outputs/ # Generated files
│
└── logs/ # Service logs
Check service health:
./scripts/health-check.shView logs:
tail -f logs/controller.log
tail -f logs/modelOrpheus.logMonitor GPU usage:
./scripts/monitor-gpus.shService won't start:
tail -f logs/<service>.logCheck for errors, missing dependencies, or invalid tokens.
Missing ffmpeg:
sudo apt-get update && sudo apt-get install -y ffmpeg
./scripts/restart.shModel loading fails:
- Verify HuggingFace token in
.env - Check disk space (models are large)
- Review service logs in
logs/directory
Note: First run may take several minutes as models are downloaded and cached.
For deployment on EPFL RCP cluster, refer to LiGHT RCP Documentation.
- MultiMeditron - EPFL LiGHT Lab
- Orpheus - Canopy Labs
- Bark - Suno AI
- Whisper - OpenAI
- Qwen3-Omni - Alibaba Cloud
- NISQA - TU Berlin
Semester Project | Nicolas Teissier | LiGHT Laboratory | EPFL