██████╗ ███████╗███╗ ██╗██╗███████╗ ██╔════╝ ██╔════╝████╗ ██║██║██╔════╝ ██║ ███╗█████╗ ██╔██╗ ██║██║█████╗ ██║ ██║██╔══╝ ██║╚██╗██║██║██╔══╝ ╚██████╔╝███████╗██║ ╚████║██║███████╗ ╚═════╝ ╚══════╝╚═╝ ╚═══╝╚═╝╚══════╝
🔮 GENIE: GPT-SoVITS Lightweight Inference Engine
Experience near-instantaneous speech synthesis on your CPU
GENIE is a lightweight inference engine built on the open-source TTS project GPT-SoVITS. It integrates TTS inference, ONNX model conversion, API server, and other core features, aiming to provide ultimate performance and convenience.
- ✅ Supported Model Version: GPT-SoVITS V2, V2ProPlus
- ✅ Supported Language: Japanese, English, Chinese, Korean
- ✅ Supported Python Version: >= 3.9
GENIE optimizes the original model for outstanding CPU performance.
| Feature | 🔮 GENIE | Official PyTorch Model | Official ONNX Model |
|---|---|---|---|
| First Inference Latency | 1.13s | 1.35s | 3.57s |
| Runtime Size | ~200MB | ~several GB | Similar to GENIE |
| Model Size | ~230MB | Similar to GENIE | ~750MB |
📝 Latency Test Info: All latency data is based on a test set of 100 Japanese sentences (~20 characters each), averaged. Tested on CPU i7-13620H.
⚠️ Important: It is recommended to run GENIE in Administrator mode to avoid potential performance degradation.
Install via pip:
pip install genie-ttsWhen running GENIE for the first time, it requires downloading resource files (~391MB). You can follow the library's prompts to download them automatically.
Alternatively, you can manually download the files from HuggingFace and place them in a local folder. Then set the
GENIE_DATA_DIRenvironment variable before importing the library:
import os
# Set the path to your manually downloaded resource files
# Note: Do this BEFORE importing genie_tts
os.environ["GENIE_DATA_DIR"] = r"C:\path\to\your\GenieData"
import genie_tts as genie
# The library will now load resources from the specified directoryIf you want the optional Chinese RoBERTa text features used only for Chinese inference to improve Chinese prosody, you can also download them with:
import genie_tts as genie
# Download only the optional Chinese RoBERTa assets
genie.download_roberta_data()
# Or use the built-in full resource download flow,
# which now also downloads the optional Chinese RoBERTa assets
genie.download_genie_data()These RoBERTa features are intended only for the Chinese path to improve Chinese prosody. They should not be used for non-Chinese inference (Japanese / English / Korean).
No GPT-SoVITS model yet? No problem! GENIE includes several predefined speaker characters you can use immediately — for example:
- Mika (聖園ミカ) — Blue Archive (Japanese)
- ThirtySeven (37) — Reverse: 1999 (English)
- Feibi (菲比) — Wuthering Waves (Chinese)
You can browse all available characters here: https://huggingface.co/High-Logic/Genie/tree/main/CharacterModels
Try it out with the example below:
import genie_tts as genie
import time
# Automatically downloads required files on first run
genie.load_predefined_character('mika')
genie.tts(
character_name='mika',
text='どうしようかな……やっぱりやりたいかも……!',
play=True, # Play the generated audio directly
)
genie.wait_for_playback_done() # Ensure audio playback completesA simple TTS inference example:
import genie_tts as genie
# Step 1: Load character voice model
genie.load_character(
character_name='<CHARACTER_NAME>', # Replace with your character name
onnx_model_dir=r"<PATH_TO_CHARACTER_ONNX_MODEL_DIR>", # Folder containing ONNX model
language='<LANGUAGE_CODE>', # Replace with language code, e.g., 'en', 'zh', 'jp'
)
# Step 2: Set reference audio (for emotion and intonation cloning)
genie.set_reference_audio(
character_name='<CHARACTER_NAME>', # Must match loaded character name
audio_path=r"<PATH_TO_REFERENCE_AUDIO>", # Path to reference audio
audio_text="<REFERENCE_AUDIO_TEXT>", # Corresponding text
)
# Step 3: Run TTS inference and generate audio
genie.tts(
character_name='<CHARACTER_NAME>', # Must match loaded character
text="<TEXT_TO_SYNTHESIZE>", # Text to synthesize
play=True, # Play audio directly
save_path="<OUTPUT_AUDIO_PATH>", # Output audio file path
)
genie.wait_for_playback_done() # Ensure audio playback completes
print("🎉 Audio generation complete!")To convert original GPT-SoVITS models for GENIE, ensure torch is installed:
pip install torchUse the built-in conversion tool:
Tip:
convert_to_onnxcurrently supports V2 and V2ProPlus models.
import genie_tts as genie
genie.convert_to_onnx(
torch_pth_path=r"<YOUR .PTH MODEL FILE>", # Replace with your .pth file
torch_ckpt_path=r"<YOUR .CKPT CHECKPOINT FILE>", # Replace with your .ckpt file
output_dir=r"<ONNX MODEL OUTPUT DIRECTORY>" # Directory to save ONNX model
)GENIE includes a lightweight FastAPI server:
import genie_tts as genie
# Start server
genie.start_server(
host="0.0.0.0", # Host address
port=8000, # Port
workers=1 # Number of workers
)For request formats and API details, see our API Server Tutorial.
-
🌐 Language Expansion
- Add support for Chinese and English.
-
🚀 Model Compatibility
- Support for
V2Proplus. - Support for
V3,V4, and more.
- Support for
-
📦 Easy Deployment
- Release Official Docker images.
- Provide out-of-the-box Windows bundles.