Skip to content
5 changes: 5 additions & 0 deletions .changeset/afraid-beds-bet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@livekit/agents-plugin-openai': patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that the now legacy AudioFormat was exposed to users and we're renaming it, I think this should be a minor version upgrade.

Copy link
Contributor Author

@toubatbrian toubatbrian Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make it backward competible tho. One thing makes it hard to do minor bump is that we recently sync all plugin packages, along with main agent package, to the same version on every release. Making a minor bump on one plugin would cause all packages to be minor bumped. Any idea on how we should properly handle this?

Copy link
Contributor Author

@toubatbrian toubatbrian Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, while AudioFormat type changed from a string ('pcm16') to an interface ({ type: 'audio/pcm', rate: number }), users never interact with this type directly - it's only used internally for the WebSocket protocol.

The public API (constructor options) remains fully backward compatible:
Beta:

new RealtimeModel({  model: 'gpt-4o-realtime',  voice: 'alloy',  temperature: 0.8,  inputAudioTranscription: { model: 'whisper-1' },  turnDetection: { type: 'server_vad' },  // ... other options});

GA (same code still works):

new RealtimeModel({  model: 'gpt-4o-realtime',  voice: 'alloy',  temperature: 0.8,  // kept for compat, marked @deprecated  inputAudioTranscription: { model: 'whisper-1' },  turnDetection: { type: 'server_vad' },  // ... other options  // NEW optional: inputAudioNoiseReduction, tracing});

All existing constructor options are preserved, temperature is kept (just deprecated), and new options (inputAudioNoiseReduction, tracing) are additive. Keeping it as a patch should be fine since no user-facing breaking changes.

Let me know how you think @lukasIO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just verified, both

llm: new openai.realtime.beta.RealtimeModel(),

and

llm: new openai.realtime.RealtimeModel(),

are working properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, while AudioFormat type changed from a string ('pcm16') to an interface ({ type: 'audio/pcm', rate: number }), users never interact with this type directly - it's only used internally for the WebSocket protocol.

if that's the case then we should think about not exporting it from index.ts in the first place going forward. Right now it's available for users to interact with as a type even if they don't directly use it.

One thing makes it hard to do minor bump is that we recently sync all plugin packages, along with main agent package, to the same version on every release. Making a minor bump on one plugin would cause all packages to be minor bumped.

yeah, this situation is less than ideal imo. Let's discuss this in our sync next week.

---

Support OpenAI realtime GA API
2 changes: 1 addition & 1 deletion examples/src/realtime_agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ export default defineAgent({
});

const session = new voice.AgentSession({
// llm: new openai.realtime.RealtimeModel(),
// llm: new openai.realtime.beta.RealtimeModel(),
llm: new openai.realtime.RealtimeModel(),
// enable to allow chaining of tool calls
voiceOptions: {
Expand Down
93 changes: 76 additions & 17 deletions plugins/openai/src/realtime/api_proto.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ export type Voice =
| 'sage'
| 'verse'
| string;
export type AudioFormat = 'pcm16'; // TODO: 'g711-ulaw' | 'g711-alaw'

/** @deprecated Use LegacyAudioFormat for Beta API string format or AudioFormat for GA API object format */
export type LegacyAudioFormat = 'pcm16'; // TODO: 'g711-ulaw' | 'g711-alaw' (Beta format)
export type Role = 'system' | 'assistant' | 'user' | 'tool';
export type GenerationFinishedReason = 'stop' | 'max_tokens' | 'content_filter' | 'interrupt';
export type InputTranscriptionModel = 'whisper-1' | string; // Open-ended, for future models
Expand All @@ -44,6 +46,7 @@ export type ClientEventType =
| 'conversation.item.delete'
| 'response.create'
| 'response.cancel';

export type ServerEventType =
| 'error'
| 'session.created'
Expand All @@ -53,7 +56,8 @@ export type ServerEventType =
| 'input_audio_buffer.cleared'
| 'input_audio_buffer.speech_started'
| 'input_audio_buffer.speech_stopped'
| 'conversation.item.created'
| 'conversation.item.added' // GA: renamed from conversation.item.created
| 'conversation.item.created' // Beta: kept for backward compatibility
| 'conversation.item.input_audio_transcription.completed'
| 'conversation.item.input_audio_transcription.failed'
| 'conversation.item.truncated'
Expand All @@ -64,12 +68,18 @@ export type ServerEventType =
| 'response.output_item.done'
| 'response.content_part.added'
| 'response.content_part.done'
| 'response.text.delta'
| 'response.text.done'
| 'response.audio_transcript.delta'
| 'response.audio_transcript.done'
| 'response.audio.delta'
| 'response.audio.done'
| 'response.output_text.delta' // GA: renamed from response.text.delta
| 'response.output_text.done' // GA: renamed from response.text.done
| 'response.text.delta' // Beta: kept for backward compatibility
| 'response.text.done' // Beta: kept for backward compatibility
| 'response.output_audio_transcript.delta' // GA: renamed from response.audio_transcript.delta
| 'response.output_audio_transcript.done' // GA: renamed from response.audio_transcript.done
| 'response.audio_transcript.delta' // Beta: kept for backward compatibility
| 'response.audio_transcript.done' // Beta: kept for backward compatibility
| 'response.output_audio.delta' // GA: renamed from response.audio.delta
| 'response.output_audio.done' // GA: renamed from response.audio.done
| 'response.audio.delta' // Beta: kept for backward compatibility
| 'response.audio.done' // Beta: kept for backward compatibility
| 'response.function_call_arguments.delta'
| 'response.function_call_arguments.done'
| 'rate_limits.updated';
Expand Down Expand Up @@ -113,6 +123,35 @@ export type InputAudioTranscription = {
prompt?: string;
};

export type NoiseReductionType = 'near_field' | 'far_field';

export interface NoiseReduction {
type: NoiseReductionType;
}

export interface AudioFormat {
type: 'audio/pcm';
rate: number;
}

export interface RealtimeAudioConfigInput {
format?: AudioFormat;
noise_reduction?: NoiseReduction | null;
transcription?: InputAudioTranscription | null;
turn_detection?: TurnDetectionType | null;
}

export interface RealtimeAudioConfigOutput {
format?: AudioFormat;
speed?: number;
voice?: Voice;
}

export interface RealtimeAudioConfig {
input?: RealtimeAudioConfigInput;
output?: RealtimeAudioConfigOutput;
}

export interface InputTextContent {
type: 'input_text';
text: string;
Expand All @@ -136,9 +175,10 @@ export interface AudioContent {

export type Content = InputTextContent | InputAudioContent | TextContent | AudioContent;
export type ContentPart = {
type: 'text' | 'audio';
type: 'text' | 'audio' | 'output_text' | 'output_audio'; // GA: output_text/output_audio
audio?: AudioBase64Bytes;
transcript?: string;
text?: string; // GA: text field for output_text
};

export interface BaseItem {
Expand Down Expand Up @@ -193,8 +233,8 @@ export interface SessionResource {
modalities: Modality[]; // default: ["text", "audio"]
instructions: string;
voice: Voice; // default: "alloy"
input_audio_format: AudioFormat; // default: "pcm16"
output_audio_format: AudioFormat; // default: "pcm16"
input_audio_format: LegacyAudioFormat; // default: "pcm16"
output_audio_format: LegacyAudioFormat; // default: "pcm16"
input_audio_transcription: InputAudioTranscription | null;
turn_detection: TurnDetectionType | null;
tools: Tool[];
Expand Down Expand Up @@ -266,22 +306,34 @@ interface BaseClientEvent {
export interface SessionUpdateEvent extends BaseClientEvent {
type: 'session.update';
session: Partial<{
// GA fields
type?: 'realtime'; // GA: session type
output_modalities?: Modality[]; // GA: renamed from modalities
audio?: RealtimeAudioConfig; // GA: nested audio config
max_output_tokens?: number | 'inf'; // GA: renamed from max_response_output_tokens
tracing?: TracingConfig | null; // GA: tracing config
// Common fields
model: Model;
modalities: Modality[];
instructions: string;
tools: Tool[];
tool_choice: ToolChoice;
// Beta fields (kept for backward compatibility)
modalities: Modality[];
voice: Voice;
input_audio_format: AudioFormat;
output_audio_format: AudioFormat;
input_audio_format: LegacyAudioFormat;
output_audio_format: LegacyAudioFormat;
input_audio_transcription: InputAudioTranscription | null;
turn_detection: TurnDetectionType | null;
tools: Tool[];
tool_choice: ToolChoice;
temperature: number;
max_response_output_tokens?: number | 'inf';
speed?: number;
}>;
}

export interface TracingConfig {
enabled?: boolean;
}

export interface InputAudioBufferAppendEvent extends BaseClientEvent {
type: 'input_audio_buffer.append';
audio: AudioBase64Bytes;
Expand Down Expand Up @@ -353,7 +405,7 @@ export interface ResponseCreateEvent extends BaseClientEvent {
modalities: Modality[];
instructions: string;
voice: Voice;
output_audio_format: AudioFormat;
output_audio_format: LegacyAudioFormat;
tools?: Tool[];
tool_choice: ToolChoice;
temperature: number;
Expand Down Expand Up @@ -435,6 +487,12 @@ export interface ConversationItemCreatedEvent extends BaseServerEvent {
item: ItemResource;
}

export interface ConversationItemAddedEvent extends BaseServerEvent {
type: 'conversation.item.added';
previous_item_id: string;
item: ItemResource;
}

export interface ConversationItemInputAudioTranscriptionCompletedEvent extends BaseServerEvent {
type: 'conversation.item.input_audio_transcription.completed';
item_id: string;
Expand Down Expand Up @@ -593,6 +651,7 @@ export type ServerEvent =
| InputAudioBufferSpeechStartedEvent
| InputAudioBufferSpeechStoppedEvent
| ConversationItemCreatedEvent
| ConversationItemAddedEvent // GA: renamed from conversation.item.created
| ConversationItemInputAudioTranscriptionCompletedEvent
| ConversationItemInputAudioTranscriptionFailedEvent
| ConversationItemTruncatedEvent
Expand Down
1 change: 1 addition & 0 deletions plugins/openai/src/realtime/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
// SPDX-License-Identifier: Apache-2.0
export * from './api_proto.js';
export * from './realtime_model.js';
export * as beta from './realtime_model_beta.js';
Loading