A FastAPI-based web API for asynchronous audio and video transcription using WhisperX.
This project is licensed under the MIT license.
See the LICENSE file for details.
This project provides an API to upload media files and receive transcriptions, including alignment and speaker diarization. It leverages Celery task queues and RabbitMQ to handle transcription jobs asynchronously, allowing the API to remain responsive while processing resource-intensive tasks in the background.
- Asynchronous transcription processing with Celery
- RabbitMQ message broker integration
- Support for multiple audio and video formats
- Speaker diarization support
- Customizable language and model settings
- Built-in logging
- Job status tracking via API
- Python 3.8+
- WhisperX
- FastAPI
- ffmpeg
- SQLite (for internal use, not user management)
- python-dotenv
- Celery
- RabbitMQ server
Follow the WhisperX installation instructions: WhisperX repo
Then install Python dependencies:
pip install -r requirements.txtRabbitMQ is required as the message broker for Celery. On Ubuntu, install it via:
sudo apt-get update
sudo apt-get install rabbitmq-server -y
sudo systemctl enable --now rabbitmq-serverEnsure RabbitMQ is running before starting the application.
Create a .env file in your project root with:
| Variable | Default | Description |
|---|---|---|
HUGGING_FACE_TOKEN |
`` | Your https://huggingface.co/ token used for diarization. See https://github.com/m-bain/whisperX?tab=readme-ov-file#speaker-diarization |
API_PORT |
11300 |
Server port |
API_HOST |
0.0.0.0 |
Server bind address |
RABBIT_MQ_URI |
amqp://guest:guest@localhost:5672// |
URL to your broker |
FFMPEG_BIN |
ffmpeg |
Path to the ffmpeg binary |
FFPROBE_BIN |
ffprobe |
Path to the ffprobe binary |
WHISPERX_API_DATA_PATH |
./data |
Path where whisperx-api stores its data |
WHISPERX_API_TEMP_PATH |
./temp |
Path where whisperx-api stores temp data |
WHISPERX_CPU_ONLY |
False |
If True use CPU only version of whisperx |
python start.pyThis launches the API server (default on port 11300).
Submit a new transcription job with an uploaded media file.
List all submitted transcription jobs.
Get the status and result of a specific transcription job.
The application logs key events and errors during API requests and background task processing.
This project provides a scalable, asynchronous API for audio/video transcription using WhisperX, with support for speaker diarization and job status tracking.