feat(assemblyai): add u3-rt-pro model plus mid-stream updates, SpeechStarted, and ForceEndpoint support by gsharp-aai · Pull Request #4965 · livekit/agents

gsharp-aai · 2026-02-27T01:20:03Z

Summary

Adds Universal-3-Pro (u3-rt-pro) model support to the AssemblyAI streaming plugin with several improvements to the existing streaming implementation.

New model

Add u3-rt-pro to the supported model literals
Accept deprecated u3-pro name with a warning, remapped to u3-rt-pro
Add prompt parameter (u3-rt-pro only) for custom transcription instructions, validated at init
Default language_detection to True for u3-rt-pro model (previously only defaulted for multilingual models)
Default min_end_of_turn_silence_when_confident and max_turn_silence to 100ms for u3-rt-pro for optimal out-of-the-box performance/latency across most LiveKit configurations. This provides finals quickly for third-party turn detection models while still working well with built-in turn detection.
- If a user sets min without setting max, max defaults to match min rather than its API default of 1000ms. Both parameters are fully overridable. For reference, the AssemblyAI API defaults are min=100ms and max=1000ms. We will clearly document the plugin's defaults and how to override them.

Speaker diarization support

Add speaker_labels parameter (bool) to enable speaker diarization on the streaming connection
Add max_speakers parameter (int, 1-10) to set the maximum number of speakers when diarization is enabled
Both are connection-level parameters sent as WebSocket query params at connect time (not updatable mid-stream via UpdateConfiguration)

Mid-stream configuration updates

Replace reconnect-based update_options() with in-place UpdateConfiguration websocket messages (previously, updating options would tear down and restart the entire websocket connection)
Queue-based approach (asyncio.Queue) for thread-safe sync-to-async communication, with a dedicated coroutine that sends config messages immediately and independently of audio flow
Supported fields: prompt, keyterms_prompt, max_turn_silence, min_end_of_turn_silence_when_confident, end_of_turn_confidence_threshold, vad_threshold
Add keyterms_prompt to update_options() (previously only available at connection time)

Rename `min_end_of_turn_silence_when_confident` → `min_turn_silence`

The AssemblyAI API accepts both names via AliasChoices. This promotes min_turn_silence as the primary parameter — shorter and consistent with max_turn_silence. The old name is still accepted but logs a deprecation warning. If both are provided, min_turn_silence takes precedence.

New websocket message support

SpeechStarted: Handle new server event, mapped to SpeechEventType.START_OF_SPEECH for barge-in detection
ForceEndpoint: Add force_endpoint() method to immediately finalize the current turn via {"type": "ForceEndpoint"}

Fixes

Set interim_results=True in STTCapabilities (was incorrectly False despite emitting INTERIM_TRANSCRIPT events)
Fix send_config_task shutdown hang by separating it from asyncio.gather so it is cancelled in finally instead of blocking graceful shutdown

…eEndpoint - Rename model from u3-pro to u3-rt-pro - Replace reconnect-based update_options with UpdateConfiguration websocket messages - Add SpeechStarted event handler (maps to START_OF_SPEECH) - Add force_endpoint() to immediately finalize turns - Add keyterms_prompt to update_options/UpdateConfiguration - Fix interim_results capability (True, not False)

Move queue drain into a separate send_config_task coroutine so ForceEndpoint and UpdateConfiguration messages are sent immediately, even when no audio frames are flowing.

… u3-rt-pro Separate send_config_task from gather so it is cancelled in finally instead of blocking shutdown. Default language_detection to True for u3-rt-pro model.

Accept 'u3-pro' as a deprecated model name that remaps to 'u3-rt-pro' with a warning. For u3-rt-pro, default min_end_of_turn_silence_when_confident and max_turn_silence to 100ms (max follows min if only min is set) to minimize latency for external turn detectors.

Adds support for the new streaming diarization params (speaker_labels bool, max_speakers 1-10) as connection-level query params on the WebSocket URL.

The API accepts both names (via AliasChoices). This promotes min_turn_silence as the primary parameter while keeping backward compatibility: the old name is still accepted but logs a deprecation warning, and min_turn_silence takes precedence when both are provided.

davidzhao

lg. a couple of clarifying questions

davidzhao · 2026-03-02T07:05:29Z

livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/stt.py

+                await ws.close()

    async def _connect_ws(self) -> aiohttp.ClientWebSocketResponse:
+        # u3-rt-pro defaults: min=100, max=min (so both 100 unless overridden)


why does u3-rt-pro work differently here?

davidzhao · 2026-03-02T07:06:28Z

livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/stt.py

+            )
+        else:
+            min_silence = (
+                self._opts.min_turn_silence if is_given(self._opts.min_turn_silence) else None


setting it to None would lead to min_turn_silence: null to be sent to your end. is that desired/supported?

if not, I would recommend only setting these fields when given

None values are filtered out (via filtered_config) before building the query string, so they're never sent to the API.

gsharp-aai added 3 commits February 26, 2026 15:36

Add u3-pro model support with prompt parameter

b02df05

Remove default prompt

2502778

gsharp-aai marked this pull request as draft February 27, 2026 01:20

This comment was marked as resolved.

Sign in to view

gsharp-aai added 4 commits February 26, 2026 17:36

Send config/control messages independently of audio flow

4fedb59

Move queue drain into a separate send_config_task coroutine so ForceEndpoint and UpdateConfiguration messages are sent immediately, even when no audio frames are flowing.

Fix send_config_task shutdown hang and default language_detection for…

b03772c

… u3-rt-pro Separate send_config_task from gather so it is cancelled in finally instead of blocking shutdown. Default language_detection to True for u3-rt-pro model.

Fix type annotation for min/max silence variables

34af233

gsharp-aai marked this pull request as ready for review February 27, 2026 20:29

gsharp-aai added 3 commits February 28, 2026 01:23

Add speaker_labels and max_speakers params to AssemblyAI plugin

98258ab

Adds support for the new streaming diarization params (speaker_labels bool, max_speakers 1-10) as connection-level query params on the WebSocket URL.

Apply ruff formatting to stt.py

c0de1f0

davidzhao approved these changes Mar 2, 2026

View reviewed changes

davidzhao merged commit 44c1e85 into livekit:main Mar 2, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(assemblyai): add u3-rt-pro model plus mid-stream updates, SpeechStarted, and ForceEndpoint support#4965

feat(assemblyai): add u3-rt-pro model plus mid-stream updates, SpeechStarted, and ForceEndpoint support#4965
davidzhao merged 10 commits intolivekit:mainfrom
gsharp-aai:assemblyai-u3-pro-streaming-new

gsharp-aai commented Feb 27, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

davidzhao left a comment

Uh oh!

davidzhao Mar 2, 2026

Uh oh!

davidzhao Mar 2, 2026

Uh oh!

gsharp-aai Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gsharp-aai commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New model

Speaker diarization support

Mid-stream configuration updates

Rename min_end_of_turn_silence_when_confident → min_turn_silence

New websocket message support

Fixes

Uh oh!

This comment was marked as resolved.

Uh oh!

davidzhao left a comment

Choose a reason for hiding this comment

Uh oh!

davidzhao Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

davidzhao Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gsharp-aai Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gsharp-aai commented Feb 27, 2026 •

edited

Loading

Rename `min_end_of_turn_silence_when_confident` → `min_turn_silence`