Use this guide if you are setting up speech features on:
- Linux
- Windows
- macOS on Intel
If you are on Apple Silicon or an NVIDIA GPU box, use First-Time Audio Setup: GPU / Accelerated Systems instead.
This guide supports three base setup paths:
make-driven local setup- manual/local Python setup
- Docker + WebUI setup
For a local-first CPU setup in the current repo:
| Goal | STT | TTS | Why |
|---|---|---|---|
| Recommended first local stack | parakeet-onnx |
supertonic |
Keeps the stack local-first and avoids mandatory voice-cloning input on every TTS request |
| If you need local voice cloning immediately | parakeet-onnx |
pocket_tts |
Python/ONNX runtime; still local-first, but every request needs reference audio |
| If you want the native compiled runtime | parakeet-onnx |
pocket_tts_cpp |
Separate installer and runtime layout; streaming only works when the local CLI probe proves incremental |
| Better but more demanding | parakeet-onnx or faster-whisper |
qwen3_tts |
Strong upgrade path after the basic stack already works |
Important current-repo realities:
- The shipped explicit STT defaults are currently
parakeet-onnxfor batch and streaming. - The current
/setupaudio bundle docs still describe a different first-run path in some places. - The stock Docker profile does not bind-mount
Config_Filesormodels/, so host-side audio config/model changes are not visible inside the container until you rebuild or customize the container path.
If your only goal is "make sound come out as fast as possible", the current /setup bundle path may still be less manual than the exact supertonic path in this guide. This guide is the better fit when you want a local-first stack that you understand and can control.
You need:
- Git
- Python 3.10+ if you are using
makeor manual/local Python ffmpeggit-lfsif you want the recommendedsupertonicpath
Recommended host prerequisites by OS:
ffmpeggitgit-lfs- Python 3.10+
Typical packages:
sudo apt-get update
sudo apt-get install -y ffmpeg git git-lfs python3 python3-venv
git lfs installffmpeggitgit-lfs- Python 3.10+
Typical packages:
brew install ffmpeg git git-lfs [email protected]
git lfs installInstall:
- Python 3.10+
- FFmpeg
- Git
- Git LFS
Use winget or the official installers, then run:
git lfs installIf your server is already running, skip to Step 2.
Use this when you want a local Python install but do not want to do the venv/bootstrap steps by hand.
git clone https://github.com/rmusser01/tldw_server.git
cd tldw_server
make quickstart-install
make quickstart-localUse this when you want full control over the virtual environment and installed extras.
Linux/macOS:
git clone https://github.com/rmusser01/tldw_server.git
cd tldw_server
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .
python -m uvicorn tldw_Server_API.app.main:app --reloadWindows PowerShell:
git clone https://github.com/rmusser01/tldw_server.git
cd tldw_server
py -3.12 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -e .
python -m uvicorn tldw_Server_API.app.main:app --reloadUse this when you want the containerized first-run path.
git clone https://github.com/rmusser01/tldw_server.git
cd tldw_server
cp tldw_Server_API/Config_Files/.env.example tldw_Server_API/Config_Files/.envSet AUTH_MODE=single_user and SINGLE_USER_API_KEY=... in tldw_Server_API/Config_Files/.env, then:
docker compose --env-file tldw_Server_API/Config_Files/.env \
-f Dockerfiles/docker-compose.yml \
-f Dockerfiles/docker-compose.webui.yml \
up -d --buildOr, if you prefer the Makefile wrapper:
make quickstartImportant Docker note:
- The stock container image does not bind-mount
Config_Filesormodels/. - Host-side edits to
tldw_Server_API/Config_Files/config.txtor local model assets do not affect the running container until you rebuild the image. - If you change audio configuration on the host, rebuild with
docker compose ... up -d --build. - If you use
/setupinside the running container, those changes are container-local unless you also update the host files.
Edit config.txt and make the STT defaults explicit:
[STT-Settings]
default_batch_transcription_model = parakeet-onnx
default_streaming_transcription_model = parakeet-onnx
default_transcriber = parakeet
nemo_model_variant = onnxWhy set all four?
default_batch_transcription_modelanddefault_streaming_transcription_modelremove ambiguity.default_transcriberandnemo_model_variantkeep older compatibility paths aligned with the intended backend.
If you are on the stock Docker path, rebuild the app image after editing the file on the host:
docker compose --env-file tldw_Server_API/Config_Files/.env \
-f Dockerfiles/docker-compose.yml \
-f Dockerfiles/docker-compose.webui.yml \
up -d --buildThis guide recommends supertonic as the main local-first CPU TTS path because:
- it stays local
- it does not require reference audio on every request
- it already has an installer helper and provider support in the repo
Run from the repo root:
python Helper_Scripts/TTS_Installers/install_tts_supertonic.pyWhat this does:
- clones the upstream model repo
- copies ONNX assets into
models/supertonic/onnx - copies voice-style JSON files into
models/supertonic/voice_styles
This path currently assumes:
gitis availablegit-lfsis installed and initialized
Edit tts_providers_config.yaml:
providers:
supertonic:
enabled: true
model_path: "models/supertonic/onnx"
sample_rate: 24000
device: "cpu"
extra_params:
voice_styles_dir: "models/supertonic/voice_styles"
default_voice: "supertonic_m1"
voice_files:
supertonic_m1: "M1.json"
supertonic_f1: "F1.json"
default_total_step: 5
default_speed: 1.05
n_test: 1Edit config.txt:
[TTS-Settings]
default_provider = supertonic
default_voice = supertonic_m1
local_device = cpuYou do not have to reorder provider_priority if you set default_provider explicitly, but it is still a good idea to make the YAML reflect your preferred path long term.
Local / make paths:
# stop the server, then start it again
make quickstart-localor
python -m uvicorn tldw_Server_API.app.main:app --reloadDocker paths:
docker compose --env-file tldw_Server_API/Config_Files/.env \
-f Dockerfiles/docker-compose.yml \
-f Dockerfiles/docker-compose.webui.yml \
up -d --buildDo not stop at /health. Verify one real TTS request and one real STT request.
curl -sS http://127.0.0.1:8000/api/v1/audio/health \
-H "X-API-KEY: $SINGLE_USER_API_KEY"What you want to see:
- overall health is not
unhealthy supertonicappears under the provider details
curl -sS http://127.0.0.1:8000/api/v1/audio/voices/catalog \
-H "X-API-KEY: $SINGLE_USER_API_KEY" | jq '.supertonic'You should see voices such as supertonic_m1 and supertonic_f1.
curl -sS -X POST http://127.0.0.1:8000/api/v1/audio/speech \
-H "X-API-KEY: $SINGLE_USER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-supertonic-1",
"voice": "supertonic_m1",
"input": "This is the CPU audio setup smoke test.",
"response_format": "wav",
"stream": false
}' \
--output cpu_audio_smoke.wavcurl -sS "http://127.0.0.1:8000/api/v1/audio/transcriptions/health?model=parakeet-onnx" \
-H "X-API-KEY: $SINGLE_USER_API_KEY"What you want to see:
"provider": "parakeet""alias": "parakeet-onnx""usable": trueor"available": true
curl -sS -X POST http://127.0.0.1:8000/api/v1/audio/transcriptions \
-H "X-API-KEY: $SINGLE_USER_API_KEY" \
-F "file=@cpu_audio_smoke.wav" \
-F "model=parakeet-onnx"Success means:
- the request returns JSON
- the
textfield is close toThis is the CPU audio setup smoke test - the server does not silently switch to the wrong provider/model
Choose a PocketTTS runtime instead of supertonic if you specifically need local voice cloning on day one.
Use:
- PocketTTS Voice Cloning Guide for
pocket_tts(Python/ONNX) python Helper_Scripts/TTS_Installers/install_tts_pocket_tts_cpp.pyforpocket_tts_cpp(compiled native runtime)
Important tradeoffs:
pocket_ttsis the Python/ONNX runtime and keeps the model packaging straightforward.pocket_tts_cppis a separate compiled runtime with its own installer and runtime layout.- Both are local-first, but every request still needs either a direct
voice_referenceclip or a storedcustom:<voice_id>voice. pocket_tts_cppstreaming is only available when the local CLI probe proves incremental on this install; otherwise streaming requests fail closed.
Use qwen3_tts after the basic CPU stack already works.
Use:
Treat it as a second-step upgrade, not the first-run baseline.
- Run
ffmpeg -version - Install FFmpeg on the host
- Restart the server after fixing PATH issues on Windows
- confirm
providers.supertonic.enabled: truein tts_providers_config.yaml - confirm the asset directories exist:
models/supertonic/onnxmodels/supertonic/voice_styles
- restart the server after changing config
- re-run the installer
- verify
voice_filesstill point toM1.jsonandF1.json - check server logs for missing ONNX or style files
- re-open config.txt
- make sure both
default_batch_transcription_modelanddefault_streaming_transcription_modelare set toparakeet-onnx - make sure
default_transcriber = parakeet - restart the server
- the stock Docker image bakes in
Config_Filesat build time - rebuild the app image after host edits:
docker compose --env-file tldw_Server_API/Config_Files/.env \
-f Dockerfiles/docker-compose.yml \
-f Dockerfiles/docker-compose.webui.yml \
up -d --buildUse /setup, accept the current recommended audio bundle, and verify speech first.
Then come back to this guide if you want to move from the bundle defaults to:
parakeet-onnxsupertonicpocket_ttspocket_tts_cppqwen3_tts