-
Notifications
You must be signed in to change notification settings - Fork 222
Migrate user span #1027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Migrate user span #1027
Conversation
|
📝 WalkthroughWalkthroughAdds new telemetry attribute constants, surfaces STT model/provider and participant information into voice telemetry, creates and propagates user_turn spans in AudioRecognition, exposes RoomIO.linkedParticipant/localParticipant, and passes STT/provider/participant through AgentActivity/AgentSession to AudioRecognition. Changes
Sequence DiagramsequenceDiagram
participant Speech as Speech/Event
participant AR as AudioRecognition
participant OTel as OpenTelemetry
participant RoomIO as RoomIO
participant Agent as AgentActivity/AgentSession
Speech->>AR: START_OF_SPEECH (STT or VAD)
AR->>OTel: ensureUserTurnSpan()
OTel-->>AR: user_turn Span (bound to context)
AR->>RoomIO: getLinkedParticipant()
RoomIO-->>AR: ParticipantInfo
AR->>AR: attach attributes (participant, sttModel, sttProvider)
AR->>OTel: enter userTurnContext(span) and run hooks (onStartOfSpeech / EOU detection)
Speech->>AR: END_OF_SPEECH
AR->>OTel: end user_turn Span
Agent->>AR: close / cleanup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (3)**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example|spec)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
🔇 Additional comments (2)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
agents/src/voice/audio_recognition.ts (1)
752-759:⚠️ Potential issue | 🟡 MinorEnd any active
user_turnspan inclose()to prevent dangling spans.If AudioRecognition shuts down mid-turn, the span remains open. Ending it in
close()ensures proper cleanup on abnormal shutdown.Suggested fix
async close() { this.detachInputAudioStream(); this.silenceAudioWriter.releaseLock(); await this.commitUserTurnTask?.cancelAndWait(); await this.sttTask?.cancelAndWait(); await this.vadTask?.cancelAndWait(); await this.bounceEOUTask?.cancelAndWait(); + if (this.userTurnSpan && this.userTurnSpan.isRecording()) { + this.userTurnSpan.end(); + this.userTurnSpan = undefined; + } }
🤖 Fix all issues with AI agents
In `@agents/src/voice/audio_recognition_span.test.ts`:
- Around line 1-3: Update the SPDX header in the file to use the required 2025
year: replace the current top comment line that reads "//
SPDX-FileCopyrightText: 2026 LiveKit, Inc." with "// SPDX-FileCopyrightText:
2025 LiveKit, Inc." and ensure the existing "// SPDX-License-Identifier:
Apache-2.0" line remains unchanged so the file
(agents/src/voice/audio_recognition_span.test.ts) has the correct SPDX header
pair.
- Around line 4-12: Tests use broad any casts for FakeVAD stream, recognition
hooks, STT node, and VAD instance which triggers lint failures; replace those
any casts with the real interfaces by importing and using RecognitionHooks and
STTNode types and typing the fake VAD as VADStream/VADEvent/VADEventType (and
the VAD instance as VAD), type the recognition hooks using RecognitionHooks, and
the STT node using STTNode; also replace the mocked chat context object with
ChatContext.empty() where appropriate and update usages in AudioRecognition and
_TurnDetector tests and variables named like fakeVAD, hooks, sttNode, and
vadInstance to remove any casts.
🧹 Nitpick comments (1)
agents/src/voice/agent_session.ts (1)
704-706: TODO noted for agent_speaking participant attributes.
If you want parity with Python here, I can help wire local participant attributes into the span (Line 704).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
agents/src/telemetry/trace_types.tsagents/src/voice/agent_activity.tsagents/src/voice/agent_session.tsagents/src/voice/audio_recognition.tsagents/src/voice/audio_recognition_span.test.tsagents/src/voice/room_io/room_io.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'
Files:
agents/src/voice/agent_activity.tsagents/src/voice/audio_recognition_span.test.tsagents/src/telemetry/trace_types.tsagents/src/voice/audio_recognition.tsagents/src/voice/agent_session.tsagents/src/voice/room_io/room_io.ts
**/*.{ts,tsx}?(test|example|spec)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
When testing inference LLM, always use full model names from
agents/src/inference/models.ts(e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')
Files:
agents/src/voice/agent_activity.tsagents/src/voice/audio_recognition_span.test.tsagents/src/telemetry/trace_types.tsagents/src/voice/audio_recognition.tsagents/src/voice/agent_session.tsagents/src/voice/room_io/room_io.ts
**/*.{ts,tsx}?(test|example)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Initialize logger before using any LLM functionality with
initializeLogger({ pretty: true })from '@livekit/agents'
Files:
agents/src/voice/agent_activity.tsagents/src/voice/audio_recognition_span.test.tsagents/src/telemetry/trace_types.tsagents/src/voice/audio_recognition.tsagents/src/voice/agent_session.tsagents/src/voice/room_io/room_io.ts
🧠 Learnings (3)
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to examples/src/test_*.ts : For plugin component debugging (STT, TTS, LLM), create test example files prefixed with `test_` under the examples directory and run with `pnpm build && node ./examples/src/test_my_plugin.ts`
Applied to files:
agents/src/voice/audio_recognition_span.test.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/{examples,test}/**/*.test.ts : Include both basic streaming and tool calling tests to verify full LLM functionality
Applied to files:
agents/src/voice/audio_recognition_span.test.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example|spec) : When testing inference LLM, always use full model names from `agents/src/inference/models.ts` (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')
Applied to files:
agents/src/voice/audio_recognition_span.test.ts
🪛 GitHub Actions: Build
agents/src/voice/audio_recognition_span.test.ts
[warning] 393-393: Unexpected any. Specify a different type. [Warning/@typescript-eslint/no-explicit-any]
[warning] 57-57: Unused import or variable flagged by Prettier: Delete formatting discrepancies. [Error/prettier/prettier]
[warning] 90-90: Unexpected any. Specify a different type. [Warning/@typescript-eslint/no-explicit-any]
[warning] 176-176: Unexpected any. Specify a different type. [Warning/@typescript-eslint/no-explicit-any]
[warning] 234-234: Unexpected any. Specify a different type. [Warning/@typescript-eslint/no-explicit-any]
agents/src/voice/audio_recognition.ts
[error] 5-5: Prettier formatting issue detected. Run 'prettier --write' to fix code style issues in this file.
[warning] 9-9: Replace ·ROOT_CONTEXT,·context·as·otelContext,·trace,·type·Context,·type·Span· with a properly formatted line break sequence [Warning/prettier/prettier]
🔇 Additional comments (15)
agents/src/telemetry/trace_types.ts (1)
23-24: Telemetry attribute additions look consistent.
Namespacing and grouping align with existing attributes; no concerns.Also applies to: 32-37, 39-39, 49-49, 66-68, 73-73, 101-106
agents/src/voice/room_io/room_io.ts (1)
379-398: Linked participant accessor looks solid.
Fallback tosid/identityand optional kind are appropriate for attribution.agents/src/voice/audio_recognition_span.test.ts (2)
13-24: In-memory tracing helpers look good.
Nice, compact harness and deterministic turn detector for the assertions.Also applies to: 62-67
137-155: Span parenting and attribute assertions are solid.
Good coverage for participant + STT metadata in both STT and VAD modes.Also applies to: 250-259
agents/src/voice/agent_session.ts (3)
134-135: RoomIO encapsulation looks consistent.
Initialization and cleanup are routed through_roomIO, with guards preventing duplicate creation.Also applies to: 298-325, 874-875
742-751: Linked participant attribution onuser_speakingspan looks good.
Clear and safe attribute mapping.
795-797: Good guard before starting the away timer.
Avoids starting the timer before a participant is available.agents/src/voice/agent_activity.ts (2)
302-305: Nice wiring of STT metadata + linked participant into AudioRecognition.
This should improve span attribution without changing core behavior.
341-350: Provider extraction helper is clear and contained.
Good to keep the heuristic in one place.agents/src/voice/audio_recognition.ts (6)
61-73: Tracing metadata options and wiring look good.
Cleanly threads STT/provider and participant metadata into the class.Also applies to: 85-88, 125-127
158-191: Span creation + context helper are well-scoped.
Clear separation of span creation and context binding.
340-358: Context wrapping around STT start/end events looks correct.
Should keep downstream hooks consistently attributed.Also applies to: 366-384
429-431: EOU detection span uses the intended user-turn context.
This keepseou_detectionproperly parented.Also applies to: 470-472
633-639: VAD start/end events are correctly wrapped in user-turn context.
Good parity with the STT path.Also applies to: 661-665
5-5: The import on line 5 is already properly formatted according to Prettier configuration. The code currently wraps the import statement across multiple lines (lines 5–12) with correct indentation and trailing comma, matching the Prettier rules (printWidth: 100, trailingComma: "all").Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
agents/src/voice/audio_recognition.ts (2)
758-765:⚠️ Potential issue | 🟡 MinorEnd
userTurnSpanon close to prevent orphaned spans.If
close()is called while a user turn is in progress, theuserTurnSpanwill remain recording but never ended, leading to incomplete telemetry data.🛡️ Proposed fix
async close() { + if (this.userTurnSpan?.isRecording()) { + this.userTurnSpan.setStatus({ code: 2, message: 'Session closed' }); // SpanStatusCode.ERROR = 2 + this.userTurnSpan.end(); + this.userTurnSpan = undefined; + } this.detachInputAudioStream(); this.silenceAudioWriter.releaseLock(); await this.commitUserTurnTask?.cancelAndWait(); await this.sttTask?.cancelAndWait(); await this.vadTask?.cancelAndWait(); await this.bounceEOUTask?.cancelAndWait(); }
701-714:⚠️ Potential issue | 🟡 MinorConsider ending
userTurnSpanwhen clearing the user turn.When
clearUserTurn()is called, the transcript state is reset but theuserTurnSpanmay still be active. This could cause the next user turn to reuse the same span (if still recording), potentially merging distinct turns into one span and causing incorrect telemetry attribution.🛡️ Proposed fix
clearUserTurn() { + if (this.userTurnSpan?.isRecording()) { + this.userTurnSpan.setStatus({ code: 2, message: 'User turn cleared' }); + this.userTurnSpan.end(); + this.userTurnSpan = undefined; + } this.audioTranscript = ''; this.audioInterimTranscript = ''; this.audioPreflightTranscript = ''; this.finalTranscriptConfidence = []; this.userTurnCommitted = false;
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
agents/src/voice/audio_recognition.tsagents/src/voice/audio_recognition_span.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- agents/src/voice/audio_recognition_span.test.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'
Files:
agents/src/voice/audio_recognition.ts
**/*.{ts,tsx}?(test|example|spec)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
When testing inference LLM, always use full model names from
agents/src/inference/models.ts(e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')
Files:
agents/src/voice/audio_recognition.ts
**/*.{ts,tsx}?(test|example)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Initialize logger before using any LLM functionality with
initializeLogger({ pretty: true })from '@livekit/agents'
Files:
agents/src/voice/audio_recognition.ts
🧬 Code graph analysis (1)
agents/src/voice/audio_recognition.ts (4)
plugins/google/src/beta/gemini_tts.ts (1)
opts(165-167)agents/src/telemetry/traces.ts (1)
tracer(150-150)agents/src/telemetry/index.ts (1)
tracer(24-24)agents/src/stt/index.ts (1)
SpeechEventType(11-11)
🔇 Additional comments (6)
agents/src/voice/audio_recognition.ts (6)
5-11: LGTM!OpenTelemetry imports are correctly structured and all are utilized in the implementation.
67-79: LGTM!The new tracing options and
ParticipantInfotype are well-documented and provide appropriate flexibility for dynamic participant resolution.
164-197: LGTM!The
ensureUserTurnSpanmethod correctly handles span reuse with theisRecording()check, and properly propagates participant and STT metadata. TheuserTurnContexthelper correctly builds the context hierarchy.
346-364: LGTM!The span context wrapping for STT-based speech events is correctly implemented with appropriate variable scoping.
435-479: LGTM!The EOU detection span is correctly linked as a child of the user_turn span through the propagated context.
639-671: LGTM!The VAD event handling correctly backfills the span start time using
speechDurationfor accurate timing, and consistently wraps hooks with the appropriate OpenTelemetry context.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@agents/src/utils.ts`:
- Around line 146-151: The result getter currently only checks `#done` and returns
`#result` which is undefined for rejected futures; modify the future
implementation to record rejection (e.g., add a private `#error` property or a
`#rejected` flag that is set by the rejection path) and update any reject method
to store the thrown error into that property; then change the get result() in
the Future class to, after checking `#done`, if an error is present (or `#rejected`
is true) throw that stored error instead of returning `#result`, otherwise return
`#result` as before — reference the get result() accessor and the private fields
`#done` and `#result` when applying the change.
In `@agents/src/voice/audio_recognition.ts`:
- Around line 169-193: The user_turn Span created by ensureUserTurnSpan() is
only ended on commit, causing leaks if clearUserTurn() or close() runs mid-turn;
update clearUserTurn() and close() to check if this.userTurnSpan exists and
this.userTurnSpan.isRecording() and call this.userTurnSpan.end() (then null out
this.userTurnSpan) before performing other teardown, ensuring spans are properly
closed/exported; reference ensureUserTurnSpan, clearUserTurn, close, and
this.userTurnSpan when making the change.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
agents/src/utils.tsagents/src/voice/agent_activity.tsagents/src/voice/agent_session.tsagents/src/voice/audio_recognition.tsagents/src/voice/audio_recognition_span.test.tsagents/src/voice/room_io/room_io.tsagents/src/voice/utils.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- agents/src/voice/room_io/room_io.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'
Files:
agents/src/voice/utils.tsagents/src/voice/agent_session.tsagents/src/voice/audio_recognition.tsagents/src/utils.tsagents/src/voice/audio_recognition_span.test.tsagents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example|spec)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
When testing inference LLM, always use full model names from
agents/src/inference/models.ts(e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')
Files:
agents/src/voice/utils.tsagents/src/voice/agent_session.tsagents/src/voice/audio_recognition.tsagents/src/utils.tsagents/src/voice/audio_recognition_span.test.tsagents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Initialize logger before using any LLM functionality with
initializeLogger({ pretty: true })from '@livekit/agents'
Files:
agents/src/voice/utils.tsagents/src/voice/agent_session.tsagents/src/voice/audio_recognition.tsagents/src/utils.tsagents/src/voice/audio_recognition_span.test.tsagents/src/voice/agent_activity.ts
🧠 Learnings (4)
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to examples/src/test_*.ts : For plugin component debugging (STT, TTS, LLM), create test example files prefixed with `test_` under the examples directory and run with `pnpm build && node ./examples/src/test_my_plugin.ts`
Applied to files:
agents/src/voice/audio_recognition_span.test.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/{examples,test}/**/*.test.ts : Include both basic streaming and tool calling tests to verify full LLM functionality
Applied to files:
agents/src/voice/audio_recognition_span.test.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example|spec) : When testing inference LLM, always use full model names from `agents/src/inference/models.ts` (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')
Applied to files:
agents/src/voice/audio_recognition_span.test.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx,js,jsx} : Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'
Applied to files:
agents/src/voice/audio_recognition_span.test.ts
🧬 Code graph analysis (2)
agents/src/voice/agent_session.ts (2)
agents/src/voice/room_io/room_io.ts (1)
localParticipant(387-389)agents/src/voice/utils.ts (1)
setParticipantSpanAttributes(8-17)
agents/src/voice/agent_activity.ts (3)
agents/src/inference/stt.ts (2)
label(170-172)label(266-268)agents/src/voice/room_io/room_io.ts (1)
localParticipant(387-389)agents/src/voice/utils.ts (1)
setParticipantSpanAttributes(8-17)
🔇 Additional comments (8)
agents/src/utils.ts (1)
158-162: LGTM!The
resolvemethod correctly stores the value in#resultbefore marking the future as done.agents/src/voice/utils.ts (1)
8-28: Helper looks good for span enrichment.Cleanly centralizes participant attributes for tracing.
agents/src/voice/agent_activity.ts (2)
293-306: Nice wiring of STT/participant context into tracing.Good propagation of model/provider + participant attributes into span enrichment.
Also applies to: 1373-1376, 1838-1841
342-351:getSttProviderhelper is clear and contained.The heuristic is documented and keeps the derivation localized.
agents/src/voice/audio_recognition_span.test.ts (1)
68-155: Span parity tests look solid.Good coverage of user_turn → eou_detection parenting and key attributes across STT and VAD modes.
Also applies to: 157-259
agents/src/voice/agent_session.ts (1)
299-326: RoomIO migration and span attribution look solid.Internal _roomIO wiring and participant span enrichment are consistent and cleanly guarded.
Also applies to: 705-709, 745-748, 792-793, 871-872
agents/src/voice/audio_recognition.ts (2)
59-84: Good addition of participant/model/provider tracing inputs.The expanded options and ParticipantLike shape keep span attribution explicit and flexible.
Also applies to: 96-138
347-391: Context propagation into user_turn / eou_detection looks correct.Wrapping STT/VAD hooks and EOU detection in the user_turn context preserves proper span parenting.
Also applies to: 436-478, 640-672
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary by CodeRabbit
New Features
Tests