Skip to content

Commit a866775

Browse files
committed
docs: add transcriber fallback configuration guide
Introduced a new documentation file detailing the configuration of fallback transcribers for speech-to-text services. The guide covers the benefits, setup instructions via both the dashboard and API, provider-specific settings, best practices, and FAQs to ensure call continuity during provider outages.
1 parent 92c56a5 commit a866775

File tree

1 file changed

+174
-0
lines changed

1 file changed

+174
-0
lines changed
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
---
2+
title: Transcriber fallback configuration
3+
subtitle: Configure fallback transcribers that activate automatically if your primary transcriber fails.
4+
slug: customization/transcriber-fallback-plan
5+
---
6+
7+
## Overview
8+
9+
Transcriber fallback configuration ensures your calls continue even if your primary speech-to-text provider experiences issues. Your assistant will sequentially fallback to the transcribers you configure, in the exact order you specify.
10+
11+
**Key benefits:**
12+
- **Call continuity** during provider outages
13+
- **Automatic failover** with no user intervention required
14+
- **Provider diversity** to protect against single points of failure
15+
16+
<Note>
17+
Without a fallback plan configured, your call will end with an error if your chosen transcription provider fails.
18+
</Note>
19+
20+
## How it works
21+
22+
When a transcriber failure occurs, Vapi will:
23+
1. Detect the failure of the primary transcriber
24+
2. Switch to the first fallback transcriber in your plan
25+
3. Continue through your specified list if subsequent failures occur
26+
4. Terminate only if all transcribers in your plan have failed
27+
28+
## Configure via Dashboard
29+
30+
<Steps>
31+
<Step title="Open Transcriber tab">
32+
Navigate to your assistant and select the **Transcriber** tab.
33+
</Step>
34+
<Step title="Expand Fallback Transcribers section">
35+
Scroll down to find the **Fallback Transcribers** collapsible section. A warning indicator appears if no fallback transcribers are configured.
36+
</Step>
37+
<Step title="Add a fallback transcriber">
38+
Click **Add Fallback Transcriber** to configure your first fallback:
39+
- Select a **provider** from the dropdown
40+
- Choose a **model** (if the provider offers multiple models)
41+
- Select a **language** for transcription
42+
</Step>
43+
<Step title="Configure provider-specific settings (optional)">
44+
Expand **Additional Configuration** to access provider-specific settings like numerals formatting, VAD settings, and confidence thresholds.
45+
</Step>
46+
<Step title="Add more fallbacks">
47+
Repeat to add additional fallback transcribers. Order matters—the first fallback in your list is tried first.
48+
</Step>
49+
</Steps>
50+
51+
<Note>
52+
If HIPAA or PCI compliance is enabled on your account or assistant, only **Deepgram** and **Azure** transcribers will be available as fallback options.
53+
</Note>
54+
55+
## Configure via API
56+
57+
Add the `fallbackPlan` property to your assistant's transcriber configuration, and specify the fallback transcribers within the `transcribers` property.
58+
59+
```json
60+
{
61+
"transcriber": {
62+
"provider": "deepgram",
63+
"model": "nova-3",
64+
"language": "en",
65+
"fallbackPlan": {
66+
"transcribers": [
67+
{
68+
"provider": "assembly-ai",
69+
"speechModel": "universal-streaming-multilingual",
70+
"language": "en"
71+
},
72+
{
73+
"provider": "azure",
74+
"language": "en-US"
75+
}
76+
]
77+
}
78+
}
79+
}
80+
```
81+
82+
## Provider-specific settings
83+
84+
Each transcriber provider supports different configuration options. Expand the accordion below to see available settings for each provider.
85+
86+
<AccordionGroup>
87+
<Accordion title="Deepgram">
88+
- **model**: Model selection (`nova-3`, `nova-3-general`, `nova-3-medical`, `nova-2`, `flux-general-en`, etc.).
89+
- **language**: Language code for transcription.
90+
- **keywords**: Keywords with optional boost values for improved recognition (e.g., `["companyname", "productname:2"]`).
91+
- **keyterm**: Keyterm prompting for up to 90% keyword recall rate improvement.
92+
- **smartFormat** (boolean): Enable smart formatting for numbers and dates.
93+
- **eotThreshold** (0.5-0.9): End-of-turn confidence threshold. Only available with Flux models.
94+
- **eotTimeoutMs** (500-10000): Maximum time to wait after speech before finalizing turn. Only available with Flux models. Default is 5000ms.
95+
</Accordion>
96+
<Accordion title="AssemblyAI">
97+
- **language**: Language code (`multi` for multilingual, `en` for English).
98+
- **speechModel**: Streaming speech model (`universal-streaming-english` or `universal-streaming-multilingual`).
99+
- **wordBoost**: Custom vocabulary array (up to 2500 characters total).
100+
- **keytermsPrompt**: Array of keyterms for improved recognition (up to 100 terms, 50 characters each). Costs additional $0.04/hour.
101+
- **endUtteranceSilenceThreshold**: Duration of silence in milliseconds to detect end of utterance.
102+
- **disablePartialTranscripts** (boolean): Set to `true` to disable partial transcripts.
103+
- **confidenceThreshold** (0-1): Minimum confidence threshold for accepting transcriptions. Default is 0.4.
104+
- **vadAssistedEndpointingEnabled** (boolean): Enable VAD-based endpoint detection.
105+
</Accordion>
106+
<Accordion title="Azure">
107+
- **language**: Language code in BCP-47 format (e.g., `en-US`, `es-MX`, `fr-FR`).
108+
- **segmentationSilenceTimeoutMs** (100-5000): Duration of silence after which a phrase is finalized. Configure to adjust sensitivity to pauses.
109+
- **segmentationMaximumTimeMs** (20000-70000): Maximum duration a segment can reach before being cut off.
110+
- **segmentationStrategy**: Controls phrase boundary detection. Options: `Default`, `Time`, or `Semantic`.
111+
</Accordion>
112+
<Accordion title="Gladia">
113+
- **model**: Model selection (`fast`, `accurate`, or `solaria-1`).
114+
- **language**: Language code.
115+
- **confidenceThreshold** (0-1): Minimum confidence for transcription acceptance. Default is 0.4.
116+
- **endpointing** (0.01-10): Time in seconds to wait before considering speech ended.
117+
- **speechThreshold** (0-1): Speech detection sensitivity (0.0 to 1.0).
118+
- **prosody** (boolean): Enable prosody detection (laugh, giggle, music, etc.).
119+
- **audioEnhancer** (boolean): Pre-process audio for improved accuracy (increases latency).
120+
- **transcriptionHint**: Hint text to guide transcription.
121+
- **customVocabularyEnabled** (boolean): Enable custom vocabulary.
122+
- **customVocabularyConfig**: Custom vocabulary configuration with vocabulary array and default intensity.
123+
- **region**: Processing region (`us-west` or `eu-west`).
124+
- **receivePartialTranscripts** (boolean): Enable partial transcript delivery.
125+
</Accordion>
126+
<Accordion title="Speechmatics">
127+
- **model**: Model selection (currently only `default`).
128+
- **language**: Language code.
129+
- **operatingPoint**: Accuracy level. `standard` for faster turnaround, `enhanced` for highest accuracy. Default is `enhanced`.
130+
- **region**: Processing region (`eu` for Europe, `us` for United States). Default is `eu`.
131+
- **enableDiarization** (boolean): Enable speaker identification for multi-speaker conversations.
132+
- **maxDelayMs**: Maximum delay in milliseconds for partial transcripts. Balances latency and accuracy.
133+
</Accordion>
134+
<Accordion title="Google">
135+
- **model**: Gemini model selection.
136+
- **language**: Language selection (e.g., `Multilingual`, `English`, `Spanish`, `French`).
137+
</Accordion>
138+
<Accordion title="OpenAI">
139+
- **model**: OpenAI Realtime STT model selection (required).
140+
- **language**: Language code for transcription.
141+
</Accordion>
142+
<Accordion title="ElevenLabs">
143+
- **model**: Model selection (currently only `scribe_v1`).
144+
- **language**: ISO 639-1 language code.
145+
</Accordion>
146+
<Accordion title="Cartesia">
147+
- **model**: Model selection (currently only `ink-whisper`).
148+
- **language**: ISO 639-1 language code.
149+
</Accordion>
150+
</AccordionGroup>
151+
152+
## Best practices
153+
154+
- Use **different providers** for fallbacks to protect against provider-wide outages.
155+
- Consider **language compatibility** when selecting fallbacks—ensure all fallback transcribers support your required languages.
156+
- Test your fallback configuration to ensure smooth transitions between transcribers.
157+
- For **HIPAA/PCI compliance**, ensure all fallbacks are compliant providers (Deepgram or Azure).
158+
159+
## FAQ
160+
161+
<AccordionGroup>
162+
<Accordion title="Which providers support fallback?">
163+
All major transcriber providers are supported: Deepgram, AssemblyAI, Azure, Gladia, Google, Speechmatics, Cartesia, ElevenLabs, and OpenAI.
164+
</Accordion>
165+
<Accordion title="Does fallback affect pricing?">
166+
No additional fees for using fallback transcribers. You are only billed for the transcriber that processes the audio.
167+
</Accordion>
168+
<Accordion title="How fast is the failover?">
169+
Failover typically occurs within milliseconds of detecting a failure, ensuring minimal disruption to the call.
170+
</Accordion>
171+
<Accordion title="Can I use different languages for fallbacks?">
172+
Yes, each fallback transcriber can have its own language configuration. However, for the best user experience, we recommend using the same or similar languages across all fallbacks.
173+
</Accordion>
174+
</AccordionGroup>

0 commit comments

Comments
 (0)