Skip to content

feat(assemblyai): add u3-rt-pro model plus mid-stream updates, SpeechStarted, and ForceEndpoint support#4965

Merged
davidzhao merged 10 commits intolivekit:mainfrom
gsharp-aai:assemblyai-u3-pro-streaming-new
Mar 2, 2026
Merged

feat(assemblyai): add u3-rt-pro model plus mid-stream updates, SpeechStarted, and ForceEndpoint support#4965
davidzhao merged 10 commits intolivekit:mainfrom
gsharp-aai:assemblyai-u3-pro-streaming-new

Conversation

@gsharp-aai
Copy link
Contributor

@gsharp-aai gsharp-aai commented Feb 27, 2026

Summary

Adds Universal-3-Pro (u3-rt-pro) model support to the AssemblyAI streaming plugin with several improvements to the existing streaming implementation.

New model

  • Add u3-rt-pro to the supported model literals
  • Accept deprecated u3-pro name with a warning, remapped to u3-rt-pro
  • Add prompt parameter (u3-rt-pro only) for custom transcription instructions, validated at init
  • Default language_detection to True for u3-rt-pro model (previously only defaulted for multilingual models)
  • Default min_end_of_turn_silence_when_confident and max_turn_silence to 100ms for u3-rt-pro for optimal out-of-the-box performance/latency across most LiveKit configurations. This provides finals quickly for third-party turn detection models while still working well with built-in turn detection.
    • If a user sets min without setting max, max defaults to match min rather than its API default of 1000ms. Both parameters are fully overridable. For reference, the AssemblyAI API defaults are min=100ms and max=1000ms. We will clearly document the plugin's defaults and how to override them.

Speaker diarization support

  • Add speaker_labels parameter (bool) to enable speaker diarization on the streaming connection
  • Add max_speakers parameter (int, 1-10) to set the maximum number of speakers when diarization is enabled
  • Both are connection-level parameters sent as WebSocket query params at connect time (not updatable mid-stream via UpdateConfiguration)

Mid-stream configuration updates

  • Replace reconnect-based update_options() with in-place UpdateConfiguration websocket messages (previously, updating options would tear down and restart the entire websocket connection)
  • Queue-based approach (asyncio.Queue) for thread-safe sync-to-async communication, with a dedicated coroutine that sends config messages immediately and independently of audio flow
  • Supported fields: prompt, keyterms_prompt, max_turn_silence, min_end_of_turn_silence_when_confident, end_of_turn_confidence_threshold, vad_threshold
  • Add keyterms_prompt to update_options() (previously only available at connection time)

Rename min_end_of_turn_silence_when_confidentmin_turn_silence

The AssemblyAI API accepts both names via AliasChoices. This promotes min_turn_silence as the primary parameter — shorter and consistent with max_turn_silence. The old name is still accepted but logs a deprecation warning. If both are provided, min_turn_silence takes precedence.

New websocket message support

  • SpeechStarted: Handle new server event, mapped to SpeechEventType.START_OF_SPEECH for barge-in detection
  • ForceEndpoint: Add force_endpoint() method to immediately finalize the current turn via {"type": "ForceEndpoint"}

Fixes

  • Set interim_results=True in STTCapabilities (was incorrectly False despite emitting INTERIM_TRANSCRIPT events)
  • Fix send_config_task shutdown hang by separating it from asyncio.gather so it is cancelled in finally instead of blocking graceful shutdown

…eEndpoint

- Rename model from u3-pro to u3-rt-pro
- Replace reconnect-based update_options with UpdateConfiguration websocket messages
- Add SpeechStarted event handler (maps to START_OF_SPEECH)
- Add force_endpoint() to immediately finalize turns
- Add keyterms_prompt to update_options/UpdateConfiguration
- Fix interim_results capability (True, not False)
@gsharp-aai gsharp-aai marked this pull request as draft February 27, 2026 01:20
devin-ai-integration[bot]

This comment was marked as resolved.

Move queue drain into a separate send_config_task coroutine so
ForceEndpoint and UpdateConfiguration messages are sent immediately,
even when no audio frames are flowing.
… u3-rt-pro

Separate send_config_task from gather so it is cancelled in finally
instead of blocking shutdown. Default language_detection to True for
u3-rt-pro model.
Accept 'u3-pro' as a deprecated model name that remaps to 'u3-rt-pro'
with a warning. For u3-rt-pro, default min_end_of_turn_silence_when_confident
and max_turn_silence to 100ms (max follows min if only min is set) to
minimize latency for external turn detectors.
@gsharp-aai gsharp-aai marked this pull request as ready for review February 27, 2026 20:29
Adds support for the new streaming diarization params (speaker_labels bool,
max_speakers 1-10) as connection-level query params on the WebSocket URL.
The API accepts both names (via AliasChoices). This promotes
min_turn_silence as the primary parameter while keeping backward
compatibility: the old name is still accepted but logs a deprecation
warning, and min_turn_silence takes precedence when both are provided.
Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg. a couple of clarifying questions

await ws.close()

async def _connect_ws(self) -> aiohttp.ClientWebSocketResponse:
# u3-rt-pro defaults: min=100, max=min (so both 100 unless overridden)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does u3-rt-pro work differently here?

)
else:
min_silence = (
self._opts.min_turn_silence if is_given(self._opts.min_turn_silence) else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting it to None would lead to min_turn_silence: null to be sent to your end. is that desired/supported?

if not, I would recommend only setting these fields when given

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None values are filtered out (via filtered_config) before building the query string, so they're never sent to the API.

@davidzhao davidzhao merged commit 44c1e85 into livekit:main Mar 2, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants