Skip to content

elan-ev/whisperx-api

 
 

Repository files navigation

WhisperX API with Asynchronous Transcription

A FastAPI-based web API for asynchronous audio and video transcription using WhisperX.

License

This project is licensed under the MIT license.

See the LICENSE file for details.

Overview

This project provides an API to upload media files and receive transcriptions, including alignment and speaker diarization. It leverages Celery task queues and RabbitMQ to handle transcription jobs asynchronously, allowing the API to remain responsive while processing resource-intensive tasks in the background.

Features

  • Asynchronous transcription processing with Celery
  • RabbitMQ message broker integration
  • Support for multiple audio and video formats
  • Speaker diarization support
  • Customizable language and model settings
  • Built-in logging
  • Job status tracking via API

Requirements

  • Python 3.8+
  • WhisperX
  • FastAPI
  • ffmpeg
  • SQLite (for internal use, not user management)
  • python-dotenv
  • Celery
  • RabbitMQ server

Installing dependencies

Follow the WhisperX installation instructions: WhisperX repo

Then install Python dependencies:

pip install -r requirements.txt

RabbitMQ installation

RabbitMQ is required as the message broker for Celery. On Ubuntu, install it via:

sudo apt-get update
sudo apt-get install rabbitmq-server -y
sudo systemctl enable --now rabbitmq-server

Ensure RabbitMQ is running before starting the application.

Environment Variables

Create a .env file in your project root with:

Variable Default Description
HUGGING_FACE_TOKEN `` Your https://huggingface.co/ token used for diarization. See https://github.com/m-bain/whisperX?tab=readme-ov-file#speaker-diarization
API_PORT 11300 Server port
API_HOST 0.0.0.0 Server bind address
RABBIT_MQ_URI amqp://guest:guest@localhost:5672// URL to your broker
FFMPEG_BIN ffmpeg Path to the ffmpeg binary
FFPROBE_BIN ffprobe Path to the ffprobe binary
WHISPERX_API_DATA_PATH ./data Path where whisperx-api stores its data
WHISPERX_API_TEMP_PATH ./temp Path where whisperx-api stores temp data
WHISPERX_CPU_ONLY False If True use CPU only version of whisperx

Running the Application

1. Start the FastAPI server

python start.py

This launches the API server (default on port 11300).

API Endpoints

POST /jobs

Submit a new transcription job with an uploaded media file.

GET /jobs

List all submitted transcription jobs.

GET /jobs/{task_id}

Get the status and result of a specific transcription job.

Logging

The application logs key events and errors during API requests and background task processing.

Summary

This project provides a scalable, asynchronous API for audio/video transcription using WhisperX, with support for speaker diarization and job status tracking.

About

Whisperx API implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.0%
  • Dockerfile 2.5%
  • Shell 0.5%