OpenAI Realtime API¶

Official Documentation

📝 Overview¶

Introduction¶

OpenAI Realtime API provides two connection methods:

WebRTC - For real-time audio/video interaction in browsers and mobile clients
WebSocket - For server-to-server application integration

Use Cases¶

Real-time voice conversations
Audio/video conferencing
Real-time translation
Speech transcription
Real-time code generation
Server-side real-time integration

Key Features¶

Bidirectional audio streaming
Mixed text and audio conversations
Function calling support
Automatic Voice Activity Detection (VAD)
Audio transcription capabilities
WebSocket server-side integration

🔐 Authentication & Security¶

Authentication Methods¶

Standard API Key (server-side only)
Ephemeral Token (client-side use)

Ephemeral Token¶

Validity: 1 minute
Usage limit: Single connection
Generation: Created via server-side API

POST https://computevault.unodetech.xyz/v1/realtime/sessions
Content-Type: application/json
Authorization: Bearer $API_KEY

{
  "model": "gpt-4o-realtime-preview-2024-12-17",
  "voice": "verse"
}

Security Recommendations¶

Never expose standard API keys on the client side
Use HTTPS/WSS for communication
Implement appropriate access controls
Monitor for unusual activity

🔌 Connection Establishment¶

WebRTC Connection¶

URL: https://computevault.unodetech.xyz/v1/realtime
Query parameters: model
Headers:
Authorization: Bearer EPHEMERAL_KEY
Content-Type: application/sdp

WebSocket Connection¶

URL: wss://computevault.unodetech.xyz/v1/realtime
Query parameters: model
Headers:
Authorization: Bearer YOUR_API_KEY
OpenAI-Beta: realtime=v1

Connection Flow¶

sequenceDiagram
    participant Client
    participant Server
    participant OpenAI

    alt WebRTC Connection
        Client->>Server: Request ephemeral token
        Server->>OpenAI: Create session
        OpenAI-->>Server: Return ephemeral token
        Server-->>Client: Return ephemeral token

        Client->>OpenAI: Create WebRTC offer
        OpenAI-->>Client: Return answer

        Note over Client,OpenAI: Establish WebRTC connection

        Client->>OpenAI: Create data channel
        OpenAI-->>Client: Confirm data channel
    else WebSocket Connection
        Server->>OpenAI: Establish WebSocket connection
        OpenAI-->>Server: Confirm connection

        Note over Server,OpenAI: Begin real-time conversation
    end

Data Channel¶

Name: oai-events
Purpose: Event transmission
Format: JSON

Audio Stream¶

Input: addTrack()
Output: ontrack event

💬 Conversation Interaction¶

Conversation Modes¶

Text-only conversations
Voice conversations
Mixed conversations

Session Management¶

Create session
Update session
End session
Session configuration

Event Types¶

Text events
Audio events
Function calls
Status updates
Error events

⚙️ Configuration Options¶

Audio Configuration¶

Input formats
pcm16
g711_ulaw
g711_alaw
Output formats
pcm16
g711_ulaw
g711_alaw
Voice types
alloy
echo
shimmer

Model Configuration¶

Temperature
Maximum output length
System prompt
Tool configuration

VAD Configuration¶

Threshold
Silence duration
Prefix padding

💡 Request Examples¶

WebSocket Connection ✅¶

Node.js (ws module)¶

import WebSocket from "ws";

const url = "wss://computevault.unodetech.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
  headers: {
    "Authorization": "Bearer " + process.env.API_KEY,
    "OpenAI-Beta": "realtime=v1",
  },
});

ws.on("open", function open() {
  console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
  console.log(JSON.parse(message.toString()));
});

Python (websocket-client)¶

# Requires websocket-client library:
# pip install websocket-client

import os
import json
import websocket

API_KEY = os.environ.get("API_KEY")

url = "wss://computevault.unodetech.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
headers = [
    "Authorization: Bearer " + API_KEY,
    "OpenAI-Beta: realtime=v1"
]

def on_open(ws):
    print("Connected to server.");

def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))

ws = websocket.WebSocketApp(
    url,
    header=headers,
    on_open=on_open,
    on_message=on_message,
)

ws.run_forever()

Browser (Standard WebSocket)¶

/*
Note: In browser and other client environments, we recommend using WebRTC.
But in Deno and Cloudflare Workers and other browser-like environments,
you can also use the standard WebSocket interface.
*/

const ws = new WebSocket(
  "wss://computevault.unodetech.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17",
  [
    "realtime",
    // Authentication
    "openai-insecure-api-key." + API_KEY, 
    // Optional
    "openai-organization." + OPENAI_ORG_ID,
    "openai-project." + OPENAI_PROJECT_ID,
    // Beta protocol, required
    "openai-beta.realtime-v1"
  ]
);

ws.on("open", function open() {
  console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
  console.log(message.data);
});

Message Send/Receive Example¶

Node.js/Browser¶

// Receive server events
ws.on("message", function incoming(message) {
  // Need to parse message data from JSON
  const serverEvent = JSON.parse(message.data)
  console.log(serverEvent);
});

// Send events, create JSON data structure conforming to client event format
const event = {
  type: "response.create",
  response: {
    modalities: ["audio", "text"],
    instructions: "Give me a haiku about code.",
  }
};
ws.send(JSON.stringify(event));

Python¶

# Send client events, serialize dictionary to JSON
def on_open(ws):
    print("Connected to server.");

    event = {
        "type": "response.create",
        "response": {
            "modalities": ["text"],
            "instructions": "Please assist the user."
        }
    }
    ws.send(json.dumps(event))

# Receive messages need to parse message payload from JSON
def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))

WebSocket Python Audio Example¶

Example Documentation¶

This is a Python example of OpenAI Realtime WebSocket voice conversation, supporting real-time voice input and output.

Features¶

🎤 Real-time Voice Recording: Automatically detects voice input and sends it to the server
🔊 Real-time Audio Playback: Plays AI's voice responses
📝 Text Display: Simultaneously displays AI's text responses
🎯 Automatic Voice Detection: Uses server-side VAD (Voice Activity Detection)
🔄 Bidirectional Communication: Supports continuous conversation

Requirements¶

Python 3.7+
Microphone and speakers
Stable network connection

Install Dependencies¶

pip install -r requirements.txt

System Dependencies: On Linux systems, you may need to install additional audio libraries:

# Ubuntu/Debian
sudo apt-get install portaudio19-dev python3-pyaudio

# CentOS/RHEL
sudo yum install portaudio-devel

Configuration¶

In the openai_realtime_client.py file, ensure the following configuration is correct:

WEBSOCKET_URL = "wss://new-api.weroam.xyz/v1/realtime"
API_KEY = "sk-QMA3eCob2EmDI2graXo4onUupApVnrk8wPXemJ7lSfK4QPa0"
MODEL = "gpt-4o-realtime-preview-2024-12-17"

Usage¶

Run the program:
```
python openai_realtime_client.py
```
Start conversation:
The program will automatically start recording after launch
Speak into the microphone
AI will respond to your voice in real-time
Stop the program:
Press Ctrl+C to stop the program

Technical Details¶

Audio Configuration: - Sample Rate: 24kHz (OpenAI Realtime API requirement) - Format: PCM16 - Channels: Mono - Encoding: Base64

WebSocket Message Types: - session.update: Session configuration - input_audio_buffer.append: Send audio data - input_audio_buffer.commit: Commit audio buffer - response.audio.delta: Receive audio response - response.text.delta: Receive text response

Voice Activity Detection: Uses server-side VAD configuration: - Threshold: 0.5 - Prefix padding: 300ms - Silence duration: 500ms

Troubleshooting¶

Common Issues:

Audio Device Issues:

# Check audio devices
python -c "import pyaudio; p = pyaudio.PyAudio(); print([p.get_device_info_by_index(i) for i in range(p.get_device_count())])"

Permission Issues:
Ensure the program has microphone access permissions
Linux: Check ALSA/PulseAudio configuration
Network Connection Issues:
Check if the WebSocket URL is correct
Ensure the API key is valid
Check firewall settings

Debug Mode:

Enable verbose logging:

logging.basicConfig(level=logging.DEBUG)

Code Structure¶

├── openai_realtime_client.py  # Main program file
├── requirements.txt           # Python dependencies
└── README.md                  # Documentation

Main Classes and Methods:

OpenAIRealtimeClient: Main client class
connect(): Connect to WebSocket
start_audio_streams(): Start audio streams
start_recording(): Start recording
handle_response(): Handle responses
start_conversation(): Start conversation

Notes¶

Audio Quality: Ensure use in a quiet environment for best results
Network Latency: Real-time conversation is sensitive to network latency
Resource Usage: Long-running sessions may consume significant CPU and memory
API Limits: Be aware of OpenAI API usage limits and costs

License¶

This project is for learning and testing purposes only. Please comply with OpenAI's terms of use.

Example Code¶

#!/usr/bin/env python3
"""
OpenAI Realtime WebSocket Audio Example
Supports real-time voice conversation, including audio recording, sending, and playback
"""

import asyncio
import json
import base64
import websockets
import pyaudio
import wave
import threading
import time
from typing import Optional
import logging

# Configure logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),
        logging.FileHandler('websocket_debug.log', encoding='utf-8')
    ]
)
logger = logging.getLogger(__name__)

class OpenAIRealtimeClient:
    def __init__(self, 
                 websocket_url: str,
                 api_key: str,
                 model: str = "gpt-4o-realtime-preview-2024-12-17"):
        self.websocket_url = websocket_url
        self.api_key = api_key
        self.model = model
        self.websocket = None
        self.is_recording = False
        self.is_connected = False

        # Audio configuration
        self.audio_format = pyaudio.paInt16
        self.channels = 1
        self.rate = 24000  # OpenAI Realtime API required sample rate
        self.chunk = 1024
        self.audio = pyaudio.PyAudio()

        # Audio streams
        self.input_stream = None
        self.output_stream = None

    async def connect(self):
        """Connect to WebSocket server"""
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "OpenAI-Beta": "realtime=v1"
        }

        logger.info("=" * 80)
        logger.info("🚀 Starting WebSocket connection")
        logger.info("=" * 80)
        logger.info(f"Connection URL: {self.websocket_url}")
        logger.info(f"API Key: {self.api_key[:10]}...")
        logger.info(f"Headers: {json.dumps(headers, ensure_ascii=False, indent=2)}")

        try:
            self.websocket = await websockets.connect(
                self.websocket_url,
                additional_headers=headers
            )
            self.is_connected = True
            logger.info("✅ WebSocket connection successful")

            # Send session configuration
            await self.send_session_config()

        except Exception as e:
            logger.error(f"❌ WebSocket connection failed: {e}")
            logger.error(f"Error type: {type(e).__name__}")
            raise

    async def send_session_config(self):
        """Send session configuration"""
        config = {
            "type": "session.update",
            "session": {
                "modalities": ["text", "audio"],
                "instructions": "You are a helpful AI assistant that can engage in real-time voice conversations.",
                "voice": "alloy",
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {
                    "model": "whisper-1"
                },
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.5,
                    "prefix_padding_ms": 300,
                    "silence_duration_ms": 500
                },
                "tools": [],
                "tool_choice": "auto",
                "temperature": 0.8,
                "max_response_output_tokens": 4096
            }
        }

        config_json = json.dumps(config, ensure_ascii=False, indent=2)
        logger.info("=" * 60)
        logger.info("📤 Sending session configuration:")
        logger.info(f"Message type: {config['type']}")
        logger.info(f"Configuration content:\n{config_json}")
        logger.info("=" * 60)

        await self.websocket.send(json.dumps(config))
        logger.info("✅ Session configuration sent")

    def start_audio_streams(self):
        """Start audio input and output streams"""
        try:
            # Input stream (microphone)
            self.input_stream = self.audio.open(
                format=self.audio_format,
                channels=self.channels,
                rate=self.rate,
                input=True,
                frames_per_buffer=self.chunk
            )

            # Output stream (speakers)
            self.output_stream = self.audio.open(
                format=self.audio_format,
                channels=self.channels,
                rate=self.rate,
                output=True,
                frames_per_buffer=self.chunk
            )

            logger.info("Audio streams started")

        except Exception as e:
            logger.error(f"Failed to start audio streams: {e}")
            raise

    def stop_audio_streams(self):
        """Stop audio streams"""
        if self.input_stream:
            self.input_stream.stop_stream()
            self.input_stream.close()
            self.input_stream = None

        if self.output_stream:
            self.output_stream.stop_stream()
            self.output_stream.close()
            self.output_stream = None

        logger.info("Audio streams stopped")

    async def start_recording(self):
        """Start recording and send audio data"""
        self.is_recording = True
        logger.info("Starting recording...")

        try:
            while self.is_recording and self.is_connected:
                # Read audio data
                audio_data = self.input_stream.read(self.chunk, exception_on_overflow=False)

                # Encode audio data as base64
                audio_base64 = base64.b64encode(audio_data).decode('utf-8')

                # Send audio data
                message = {
                    "type": "input_audio_buffer.append",
                    "audio": audio_base64
                }

                # Log audio data sending (every 10 times to avoid excessive logging)
                if hasattr(self, '_audio_count'):
                    self._audio_count += 1
                else:
                    self._audio_count = 1

                if self._audio_count % 10 == 0:  # Log every 10 times
                    logger.debug(f"🎤 Sending audio data #{self._audio_count}: length={len(audio_base64)} characters")

                await self.websocket.send(json.dumps(message))

                # Brief delay to avoid excessive sending
                await asyncio.sleep(0.01)

        except Exception as e:
            logger.error(f"Error during recording: {e}")
        finally:
            logger.info("Recording stopped")

    async def stop_recording(self):
        """Stop recording"""
        self.is_recording = False

        # Send recording end signal
        if self.websocket and self.is_connected:
            message = {
                "type": "input_audio_buffer.commit"
            }

            logger.info("=" * 60)
            logger.info("📤 Sending recording end signal:")
            logger.info(f"Message type: {message['type']}")
            logger.info("=" * 60)

            await self.websocket.send(json.dumps(message))
            logger.info("✅ Recording end signal sent")

    async def handle_response(self):
        """Handle WebSocket responses"""
        try:
            async for message in self.websocket:
                data = json.loads(message)
                message_type = data.get("type", "unknown")

                # Log all received messages in detail
                logger.info("=" * 60)
                logger.info("📥 Received WebSocket message:")
                logger.info(f"Message type: {message_type}")

                # Handle different message types
                if message_type == "response.audio.delta":
                    # Handle audio response
                    audio_data = base64.b64decode(data.get("delta", ""))
                    logger.info(f"🎵 Audio data: length={len(audio_data)} bytes")
                    if audio_data and self.output_stream:
                        self.output_stream.write(audio_data)
                        logger.info("✅ Audio data played")

                elif message_type == "response.text.delta":
                    # Handle text response
                    text = data.get("delta", "")
                    logger.info(f"💬 Text delta: '{text}'")
                    if text:
                        print(f"AI: {text}", end="", flush=True)

                elif message_type == "response.text.done":
                    # Text response complete
                    logger.info("✅ Text response complete")
                    print("\n")

                elif message_type == "response.audio.done":
                    # Audio response complete
                    logger.info("✅ Audio response complete")

                elif message_type == "error":
                    # Handle errors
                    error_info = data.get('error', {})
                    logger.error("❌ Server error:")
                    logger.error(f"Error details: {json.dumps(error_info, ensure_ascii=False, indent=2)}")

                elif message_type == "session.created":
                    # Session created successfully
                    logger.info("✅ Session created")

                elif message_type == "session.updated":
                    # Session updated successfully
                    logger.info("✅ Session updated")

                elif message_type == "conversation.item.created":
                    # Conversation item created
                    logger.info("📝 Conversation item created")

                elif message_type == "conversation.item.input_audio_buffer.speech_started":
                    # Speech started
                    logger.info("🎤 Speech start detected")

                elif message_type == "conversation.item.input_audio_buffer.speech_stopped":
                    # Speech stopped
                    logger.info("🔇 Speech stop detected")

                elif message_type == "conversation.item.input_audio_buffer.committed":
                    # Audio buffer committed
                    logger.info("📤 Audio buffer committed")

                else:
                    # Other unknown message types
                    logger.info(f"❓ Unknown message type: {message_type}")

                # Log complete message content (except audio data, as it's too long)
                if message_type != "response.audio.delta":
                    logger.info(f"Complete message content:\n{json.dumps(data, ensure_ascii=False, indent=2)}")

                logger.info("=" * 60)

        except websockets.exceptions.ConnectionClosed:
            logger.info("WebSocket connection closed")
            self.is_connected = False
        except Exception as e:
            logger.error(f"Error handling response: {e}")
            self.is_connected = False

    async def start_conversation(self):
        """Start conversation"""
        try:
            # Start audio streams
            self.start_audio_streams()

            # Create tasks
            response_task = asyncio.create_task(self.handle_response())
            recording_task = asyncio.create_task(self.start_recording())

            logger.info("Conversation started, press Ctrl+C to stop")

            # Wait for tasks to complete
            await asyncio.gather(response_task, recording_task)

        except KeyboardInterrupt:
            logger.info("Stop signal received")
        except Exception as e:
            logger.error(f"Error during conversation: {e}")
        finally:
            await self.cleanup()

    async def cleanup(self):
        """Clean up resources"""
        self.is_recording = False
        self.is_connected = False

        # Stop audio streams
        self.stop_audio_streams()

        # Close WebSocket connection
        if self.websocket:
            await self.websocket.close()
            logger.info("WebSocket connection closed")

        # Terminate PyAudio
        self.audio.terminate()
        logger.info("Resource cleanup complete")

    async def run(self):
        """Run client"""
        try:
            await self.connect()
            await self.start_conversation()
        except Exception as e:
            logger.error(f"Error running client: {e}")
        finally:
            await self.cleanup()


async def main():
    """Main function"""
    # Configuration parameters
    WEBSOCKET_URL = "wss://new-api.weroam.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
    API_KEY = "sk-EpnduEXFxjAt0AF55W08WBmzqZHlv9f4tmCDWd9TcJqBwVjV"
    MODEL = "gpt-4o-realtime-preview-2024-12-17"

    # Create client
    client = OpenAIRealtimeClient(
        websocket_url=WEBSOCKET_URL,
        api_key=API_KEY,
        model=MODEL
    )

    # Run client
    await client.run()


if __name__ == "__main__":
    print("OpenAI Realtime WebSocket Audio Example")
    print("=" * 50)
    print("Features:")
    print("- Real-time voice conversation")
    print("- Automatic speech recognition")
    print("- Text and audio responses")
    print("- Press Ctrl+C to stop")
    print("=" * 50)

    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print("\nProgram stopped")
    except Exception as e:
        print(f"Program error: {e}")

⚠️ Error Handling¶

Common Errors¶

Connection errors
Network issues
Authentication failures
Configuration errors
Audio errors
Device permissions
Unsupported formats
Codec issues
Session errors
Token expiration
Session timeout
Concurrency limits

Error Recovery¶

Automatic reconnection
Session recovery
Error retry
Graceful degradation

📝 Event Reference¶

Common Request Headers¶

All events need to include the following request headers:

Header	Type	Description	Example Value
Authorization	String	Authentication token	Bearer $API_KEY
OpenAI-Beta	String	API version	realtime=v1

Client Events¶

session.update¶

Update the default configuration for the session.

Parameter	Type	Required	Description	Example Value/Optional Values
event_id	String	No	Client-generated event identifier	event_123
type	String	No	Event type	session.update
modalities	String array	No	Modality types the model can respond with	["text", "audio"]
instructions	String	No	System instructions prepended to model calls	"Your knowledge cutoff is 2023-10..."
voice	String	No	Voice type used by the model	alloy, echo, shimmer
input_audio_format	String	No	Input audio format	pcm16, g711_ulaw, g711_alaw
output_audio_format	String	No	Output audio format	pcm16, g711_ulaw, g711_alaw
input_audio_transcription.model	String	No	Model used for transcription	whisper-1
turn_detection.type	String	No	Voice detection type	server_vad
turn_detection.threshold	Number	No	VAD activation threshold (0.0-1.0)	0.8
turn_detection.prefix_padding_ms	Integer	No	Audio duration included before speech starts	500
turn_detection.silence_duration_ms	Integer	No	Silence duration to detect speech stop	1000
tools	Array	No	List of tools available to the model	[]
tool_choice	String	No	How the model chooses tools	auto/none/required
temperature	Number	No	Model sampling temperature	0.8
max_output_tokens	String/Integer	No	Maximum tokens per response	"inf"/4096

input_audio_buffer.append¶

Append audio data to the input audio buffer.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_456
type	String	No	Event type	input_audio_buffer.append
audio	String	No	Base64-encoded audio data	Base64EncodedAudioData

input_audio_buffer.commit¶

Commit the audio data in the buffer as a user message.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_789
type	String	No	Event type	input_audio_buffer.commit

input_audio_buffer.clear¶

Clear all audio data from the input audio buffer.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_012
type	String	No	Event type	input_audio_buffer.clear

conversation.item.create¶

Add a new conversation item to the conversation.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_345
type	String	No	Event type	conversation.item.create
previous_item_id	String	No	New item will be inserted after this ID	null
item.id	String	No	Unique identifier for the conversation item	msg_001
item.type	String	No	Type of conversation item	message/function_call/function_call_output
item.status	String	No	Status of conversation item	completed/in_progress/incomplete
item.role	String	No	Role of message sender	user/assistant/system
item.content	Array	No	Message content	[text/audio/transcript]
item.call_id	String	No	ID of function call	call_001
item.name	String	No	Name of called function	function_name
item.arguments	String	No	Arguments for function call	{"param": "value"}
item.output	String	No	Output result of function call	{"result": "value"}

conversation.item.truncate¶

Truncate audio content in assistant messages.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_678
type	String	No	Event type	conversation.item.truncate
item_id	String	No	ID of assistant message item to truncate	msg_002
content_index	Integer	No	Index of content part to truncate	0
audio_end_ms	Integer	No	End time point for audio truncation	1500

conversation.item.delete¶

Delete the specified conversation item from conversation history.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_901
type	String	No	Event type	conversation.item.delete
item_id	String	No	ID of conversation item to delete	msg_003

response.create¶

Trigger response generation.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_234
type	String	No	Event type	response.create
response.modalities	String array	No	Modality types for response	["text", "audio"]
response.instructions	String	No	Instructions for the model	"Please assist the user."
response.voice	String	No	Voice type used by the model	alloy/echo/shimmer
response.output_audio_format	String	No	Output audio format	pcm16
response.tools	Array	No	List of tools available to the model	["type", "name", "description"]
response.tool_choice	String	No	How the model chooses tools	auto
response.temperature	Number	No	Sampling temperature	0.7
response.max_output_tokens	Integer/String	No	Maximum output tokens	150/"inf"

response.cancel¶

Cancel ongoing response generation.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_567
type	String	No	Event type	response.cancel

Server Events¶

error¶

Event returned when an error occurs.

Parameter	Type	Required	Description	Example Value
event_id	String array	No	Unique identifier for server event	["event_890"]
type	String	No	Event type	error
error.type	String	No	Error type	invalid_request_error/server_error
error.code	String	No	Error code	invalid_event
error.message	String	No	Human-readable error message	"The 'type' field is missing."
error.param	String	No	Parameter related to error	null
error.event_id	String	No	ID of related event	event_567

conversation.item.input_audio_transcription.completed¶

Returned when input audio transcription is enabled and transcription succeeds.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_2122
type	String	No	Event type	conversation.item.input_audio_transcription.completed
item_id	String	No	ID of user message item	msg_003
content_index	Integer	No	Index of content part containing audio	0
transcript	String	No	Transcribed text content	"Hello, how are you?"

conversation.item.input_audio_transcription.failed¶

Returned when input audio transcription is configured but transcription request for user message fails.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_2324
type	String array	No	Event type	["conversation.item.input_audio_transcription.failed"]
item_id	String	No	ID of user message item	msg_003
content_index	Integer	No	Index of content part containing audio	0
error.type	String	No	Error type	transcription_error
error.code	String	No	Error code	audio_unintelligible
error.message	String	No	Human-readable error message	"The audio could not be transcribed."
error.param	String	No	Parameter related to error	null

conversation.item.truncated¶

Returned when client truncates previous assistant audio message item.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_2526
type	String	No	Event type	conversation.item.truncated
item_id	String	No	ID of truncated assistant message item	msg_004
content_index	Integer	No	Index of truncated content part	0
audio_end_ms	Integer	No	Time point when audio was truncated (milliseconds)	1500

conversation.item.deleted¶

Returned when an item in the conversation is deleted.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_2728
type	String	No	Event type	conversation.item.deleted
item_id	String	No	ID of deleted conversation item	msg_005

input_audio_buffer.committed¶

Returned when audio buffer data is committed.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_1121
type	String	No	Event type	input_audio_buffer.committed
previous_item_id	String	No	New conversation item will be inserted after this ID	msg_001
item_id	String	No	ID of user message item to be created	msg_002

input_audio_buffer.cleared¶

Returned when client clears input audio buffer.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_1314
type	String	No	Event type	input_audio_buffer.cleared

input_audio_buffer.speech_started¶

In server voice detection mode, returned when voice input is detected.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_1516
type	String	No	Event type	input_audio_buffer.speech_started
audio_start_ms	Integer	No	Milliseconds from session start to voice detection	1000
item_id	String	No	ID of user message item to be created when voice stops	msg_003

input_audio_buffer.speech_stopped¶

In server voice detection mode, returned when voice input stops.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_1718
type	String	No	Event type	input_audio_buffer.speech_stopped
audio_start_ms	Integer	No	Milliseconds from session start to voice stop detection	2000
item_id	String	No	ID of user message item to be created	msg_003

response.created¶

Returned when a new response is created.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_2930
type	String	No	Event type	response.created
response.id	String	No	Unique identifier for response	resp_001
response.object	String	No	Object type	realtime.response
response.status	String	No	Status of response	in_progress
response.status_details	Object	No	Additional details about status	null
response.output	String array	No	List of output items generated by response	["[]"]
response.usage	Object	No	Usage statistics for response	null

response.done¶

Returned when response streaming is complete.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_3132
type	String	No	Event type	response.done
response.id	String	No	Unique identifier for response	resp_001
response.object	String	No	Object type	realtime.response
response.status	String	No	Final status of response	completed/cancelled/failed/incomplete
response.status_details	Object	No	Additional details about status	null
response.output	String array	No	List of output items generated by response	["[...]"]
response.usage.total_tokens	Integer	No	Total tokens	50
response.usage.input_tokens	Integer	No	Input tokens	20
response.usage.output_tokens	Integer	No	Output tokens	30

response.output_item.added¶

Returned when a new output item is created during response generation.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_3334
type	String	No	Event type	response.output_item.added
response_id	String	No	ID of response the output item belongs to	resp_001
output_index	String	No	Index of output item in response	0
item.id	String	No	Unique identifier for output item	msg_007
item.object	String	No	Object type	realtime.item
item.type	String	No	Type of output item	message/function_call/function_call_output
item.status	String	No	Status of output item	in_progress/completed
item.role	String	No	Role associated with output item	assistant
item.content	Array	No	Content of output item	["type", "text", "audio", "transcript"]

response.output_item.done¶

Returned when output item streaming is complete.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_3536
type	String	No	Event type	response.output_item.done
response_id	String	No	ID of response the output item belongs to	resp_001
output_index	String	No	Index of output item in response	0
item.id	String	No	Unique identifier for output item	msg_007
item.object	String	No	Object type	realtime.item
item.type	String	No	Type of output item	message/function_call/function_call_output
item.status	String	No	Final status of output item	completed/incomplete
item.role	String	No	Role associated with output item	assistant
item.content	Array	No	Content of output item	["type", "text", "audio", "transcript"]

response.content_part.added¶

Returned when a new content part is added to assistant message item during response generation.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_3738
type	String	No	Event type	response.content_part.added
response_id	String	No	ID of response	resp_001
item_id	String	No	ID of message item to add content part to	msg_007
output_index	Integer	No	Index of output item in response	0
content_index	Integer	No	Index of content part in message item content array	0
part.type	String	No	Content type	text/audio
part.text	String	No	Text content	"Hello"
part.audio	String	No	Base64-encoded audio data	"base64_encoded_audio_data"
part.transcript	String	No	Transcribed text of audio	"Hello"

response.content_part.done¶

Returned when content part in assistant message item streaming is complete.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_3940
type	String	No	Event type	response.content_part.done
response_id	String	No	ID of response	resp_001
item_id	String	No	ID of message item to add content part to	msg_007
output_index	Integer	No	Index of output item in response	0
content_index	Integer	No	Index of content part in message item content array	0
part.type	String	No	Content type	text/audio
part.text	String	No	Text content	"Hello"
part.audio	String	No	Base64-encoded audio data	"base64_encoded_audio_data"
part.transcript	String	No	Transcribed text of audio	"Hello"

response.text.delta¶

Returned when text value of "text" type content part is updated.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_4142
type	String	No	Event type	response.text.delta
response_id	String	No	ID of response	resp_001
item_id	String	No	ID of message item	msg_007
output_index	Integer	No	Index of output item in response	0
content_index	Integer	No	Index of content part in message item content array	0
delta	String	No	Text delta update content	"Sure, I can h"

response.text.done¶

Returned when "text" type content part text streaming is complete.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_4344
type	String	No	Event type	response.text.done
response_id	String	No	ID of response	resp_001
item_id	String	No	ID of message item	msg_007
output_index	Integer	No	Index of output item in response	0
content_index	Integer	No	Index of content part in message item content array	0
delta	String	No	Final complete text content	"Sure, I can help with that."

response.audio_transcript.delta¶

Returned when transcription content of model-generated audio output is updated.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_4546
type	String	No	Event type	response.audio_transcript.delta
response_id	String	No	ID of response	resp_001
item_id	String	No	ID of message item	msg_008
output_index	Integer	No	Index of output item in response	0
content_index	Integer	No	Index of content part in message item content array	0
delta	String	No	Transcription text delta update content	"Hello, how can I a"

response.audio_transcript.done¶

Returned when transcription of model-generated audio output streaming is complete.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_4748
type	String	No	Event type	response.audio_transcript.done
response_id	String	No	ID of response	resp_001
item_id	String	No	ID of message item	msg_008
output_index	Integer	No	Index of output item in response	0
content_index	Integer	No	Index of content part in message item content array	0
transcript	String	No	Final complete transcribed text of audio	"Hello, how can I assist you today?"

response.audio.delta¶

Returned when model-generated audio content is updated.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_4950
type	String	No	Event type	response.audio.delta
response_id	String	No	ID of response	resp_001
item_id	String	No	ID of message item	msg_008
output_index	Integer	No	Index of output item in response	0
content_index	Integer	No	Index of content part in message item content array	0
delta	String	No	Base64-encoded audio data delta	"Base64EncodedAudioDelta"

response.audio.done¶

Returned when model-generated audio is complete.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_5152
type	String	No	Event type	response.audio.done
response_id	String	No	ID of response	resp_001
item_id	String	No	ID of message item	msg_008
output_index	Integer	No	Index of output item in response	0
content_index	Integer	No	Index of content part in message item content array	0

Function Calling¶

response.function_call_arguments.delta¶

Returned when model-generated function call arguments are updated.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_5354
type	String	No	Event type	response.function_call_arguments.delta
response_id	String	No	ID of response	resp_002
item_id	String	No	ID of message item	fc_001
output_index	Integer	No	Index of output item in response	0
call_id	String	No	ID of function call	call_001
delta	String	No	JSON format function call arguments delta	"{\"location\": \"San\""

response.function_call_arguments.done¶

Returned when model-generated function call arguments streaming is complete.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_5556
type	String	No	Event type	response.function_call_arguments.done
response_id	String	No	ID of response	resp_002
item_id	String	No	ID of message item	fc_001
output_index	Integer	No	Index of output item in response	0
call_id	String	No	ID of function call	call_001
arguments	String	No	Final complete function call arguments (JSON format)	"{\"location\": \"San Francisco\"}"

Other Status Updates¶

rate_limits.updated¶

Triggered after each "response.done" event to indicate updated rate limits.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_5758
type	String	No	Event type	rate_limits.updated
rate_limits	Object array	No	List of rate limit information	[{"name": "requests_per_min", "limit": 60, "remaining": 45, "reset_seconds": 35}]

conversation.created¶

Returned when conversation is created.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_9101
type	String	No	Event type	conversation.created
conversation	Object	No	Conversation resource object	{"id": "conv_001", "object": "realtime.conversation"}

conversation.item.created¶

Returned when conversation item is created.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_1920
type	String	No	Event type	conversation.item.created
previous_item_id	String	No	ID of previous conversation item	msg_002
item	Object	No	Conversation item object	{"id": "msg_003", "object": "realtime.item", "type": "message", "status": "completed", "role": "user", "content": [{"type": "text", "text": "Hello"}]}

session.created¶

Returned when session is created.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_1234
type	String	No	Event type	session.created
session	Object	No	Session object	{"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]}

session.updated¶

Returned when session is updated.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier for server event	event_5678
type	String	No	Event type	session.updated
session	Object	No	Updated session object	{"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]}

Rate Limit Event Parameter Table¶

Parameter	Type	Required	Description	Example Value
name	String	Yes	Limit name	requests_per_min
limit	Integer	Yes	Limit value	60
remaining	Integer	Yes	Remaining available amount	45
reset_seconds	Integer	Yes	Reset time (seconds)	35

Function Call Parameter Table¶

Parameter	Type	Required	Description	Example Value
type	String	Yes	Function type	function
name	String	Yes	Function name	get_weather
description	String	No	Function description	Get the current weather
parameters	Object	Yes	Function parameter definition	{"type": "object", "properties": {...}}

Audio Format Parameter Table¶

Parameter	Type	Description	Optional Values
sample_rate	Integer	Sample rate	8000, 16000, 24000, 44100, 48000
channels	Integer	Number of channels	1 (mono), 2 (stereo)
bits_per_sample	Integer	Bits per sample	16 (pcm16), 8 (g711)
encoding	String	Encoding method	pcm16, g711_ulaw, g711_alaw

Voice Detection Parameter Table¶

Parameter	Type	Description	Default Value	Range
threshold	Float	VAD activation threshold	0.5	0.0-1.0
prefix_padding_ms	Integer	Voice prefix padding (milliseconds)	500	0-5000
silence_duration_ms	Integer	Silence detection duration (milliseconds)	1000	100-10000

Tool Selection Parameter Table¶

Parameter	Type	Description	Optional Values
tool_choice	String	Tool selection method	auto, none, required
tools	Array	Available tools list	[{type, name, description, parameters}]

Model Configuration Parameter Table¶

Parameter	Type	Description	Range/Optional Values	Default Value
temperature	Float	Sampling temperature	0.0-2.0	1.0
max_output_tokens	Integer/String	Maximum output length	1-4096/"inf"	"inf"
modalities	String array	Response modalities	["text", "audio"]	["text"]
voice	String	Voice type	alloy, echo, shimmer	alloy

Event Common Parameter Table¶

Parameter	Type	Required	Description	Example Value
event_id	String	Yes	Unique identifier for event	event_123
type	String	Yes	Event type	session.update
timestamp	Integer	No	Event timestamp (milliseconds)	1677649363000

Session Status Parameter Table¶

Parameter	Type	Description	Optional Values
status	String	Session status	active, ended, error
error	Object	Error information	{"type": "error_type", "message": "error message"}
metadata	Object	Session metadata	{"client_id": "web", "session_type": "chat"}

Conversation Item Status Parameter Table¶

Parameter	Type	Description	Optional Values
status	String	Conversation item status	completed, in_progress, incomplete
role	String	Sender role	user, assistant, system
type	String	Conversation item type	message, function_call, function_call_output

Content Type Parameter Table¶

Parameter	Type	Description	Optional Values
type	String	Content type	text, audio, transcript
format	String	Content format	plain, markdown, html
encoding	String	Encoding method	utf-8, base64

Response Status Parameter Table¶

Parameter	Type	Description	Optional Values
status	String	Response status	completed, cancelled, failed, incomplete
status_details	Object	Status details	{"reason": "user_cancelled"}
usage	Object	Usage statistics	{"total_tokens": 50, "input_tokens": 20, "output_tokens": 30}

Audio Transcription Parameter Table¶

Parameter	Type	Description	Example Value
enabled	Boolean	Whether transcription is enabled	true
model	String	Transcription model	whisper-1
language	String	Transcription language	en, zh, auto
prompt	String	Transcription prompt	"Transcript of a conversation"

Audio Stream Parameter Table¶

Parameter	Type	Description	Optional Values
chunk_size	Integer	Audio chunk size (bytes)	1024, 2048, 4096
latency	String	Latency mode	low, balanced, high
compression	String	Compression method	none, opus, mp3

WebRTC Configuration Parameter Table¶

Parameter	Type	Description	Default Value
ice_servers	Array	ICE server list	[{"urls": "stun:stun.l.google.com:19302"}]
audio_constraints	Object	Audio constraints	{"echoCancellation": true}
connection_timeout	Integer	Connection timeout (milliseconds)	30000