The Problem
Users hate waiting. A 3-second response feels like an eternity when you’re staring at a blank screen.
Traditional request/response patterns make your agent feel sluggish, even when the LLM is fast. The UI waits for the entire response before showing anything.
Streaming changes everything. Users see tokens appear in real-time, making responses feel instant even when they take seconds to complete.
Quick Start
ElizaOS supports three response modes out of the box:
Mode Latency Use Case Sync Wait for complete response Simple integrations, batch processing Stream Tokens appear in real-time Chat UIs, interactive experiences WebSocket Bidirectional, persistent Voice conversations, multi-turn
HTTP Streaming
Send a message with stream: true to get Server-Sent Events:
const response = await fetch ( `/api/agents/ ${ agentId } /message` , {
method: 'POST' ,
headers: { 'Content-Type' : 'application/json' },
body: JSON . stringify ({
entityId: 'user-123' ,
roomId: 'room-456' ,
content: { text: 'Hello!' , source: 'api' },
stream: true // Enable streaming
})
});
// Process SSE stream
const reader = response . body . getReader ();
const decoder = new TextDecoder ();
while ( true ) {
const { done , value } = await reader . read ();
if ( done ) break ;
const chunk = decoder . decode ( value );
const lines = chunk . split ( ' \n ' ). filter ( line => line . startsWith ( 'data: ' ));
for ( const line of lines ) {
const data = JSON . parse ( line . slice ( 6 ));
if ( data . type === 'chunk' ) {
process . stdout . write ( data . text ); // Display token immediately
}
}
}
WebSocket Connection
For bidirectional communication and voice conversations:
const socket = new WebSocket ( `ws://localhost:3000/api/agents/ ${ agentId } /ws` );
socket . onopen = () => {
socket . send ( JSON . stringify ({
type: 'message' ,
entityId: 'user-123' ,
roomId: 'room-456' ,
content: { text: 'Hello!' , source: 'websocket' }
}));
};
socket . onmessage = ( event ) => {
const data = JSON . parse ( event . data );
switch ( data . type ) {
case 'chunk' :
process . stdout . write ( data . text );
break ;
case 'complete' :
console . log ( ' \n --- Response complete ---' );
break ;
case 'error' :
console . error ( 'Error:' , data . message );
break ;
}
};
Stream Events
The streaming API emits these event types:
Event Description chunkA token or text fragment to display completeResponse finished, includes full text and actions errorSomething went wrong controlBackend control messages (typing indicators, etc.)
Chunk Event
{
type : 'chunk' ,
text : 'Hello' , // Text fragment to append
timestamp : 1703001234567
}
Complete Event
{
type : 'complete' ,
text : 'Hello! How can I help you today?' , // Full response
actions : [ 'REPLY' ], // Executed actions
messageId : 'msg-uuid' ,
timestamp : 1703001234890
}
ElizaOS uses stream extractors to filter LLM output for streaming. The framework provides several built-in extractors:
Streams everything as-is. Use for plain text responses.
import { PassthroughExtractor } from '@elizaos/core' ;
const extractor = new PassthroughExtractor ();
extractor . push ( 'Hello ' ); // Returns: 'Hello '
extractor . push ( 'world!' ); // Returns: 'world!'
Extracts content from a specific XML tag. Use when LLM outputs structured XML.
import { XmlTagExtractor } from '@elizaos/core' ;
const extractor = new XmlTagExtractor ( 'text' );
// LLM output: <response><text>Hello world!</text></response>
extractor . push ( '<response><text>Hello ' ); // Returns: 'Hel' (keeps margin)
extractor . push ( 'world!</text></response>' ); // Returns: 'lo world!'
Action-aware extraction used by DefaultMessageService. Understands <actions> to decide what to stream.
import { ResponseStreamExtractor } from '@elizaos/core' ;
const extractor = new ResponseStreamExtractor ();
// Only streams <text> when action is REPLY
extractor . push ( '<actions>REPLY</actions><text>Hello!' ); // Returns: 'Hel'
extractor . push ( '</text>' ); // Returns: 'lo!'
// Skips <text> when action is something else (action handler will respond)
extractor . push ( '<actions>SEARCH</actions><text>Ignored</text>' ); // Returns: ''
Implement IStreamExtractor for custom filtering logic:
import type { IStreamExtractor } from '@elizaos/core' ;
class JsonValueExtractor implements IStreamExtractor {
private buffer = '' ;
private _done = false ;
get done () { return this . _done ; }
push ( chunk : string ) : string {
this . buffer += chunk ;
// Try to parse and extract "response" field
try {
const json = JSON . parse ( this . buffer );
this . _done = true ;
return json . response || '' ;
} catch {
return '' ; // Wait for complete JSON
}
}
}
Stream Error Handling
The streaming system provides typed errors for robust handling:
import { StreamError } from '@elizaos/core' ;
try {
const result = extractor . push ( hugeChunk );
} catch ( error ) {
if ( StreamError . isStreamError ( error )) {
switch ( error . code ) {
case 'CHUNK_TOO_LARGE' :
console . error ( 'Chunk exceeded 1MB limit' );
break ;
case 'BUFFER_OVERFLOW' :
console . error ( 'Buffer exceeded 100KB' );
break ;
case 'PARSE_ERROR' :
console . error ( 'Malformed content' );
break ;
case 'TIMEOUT' :
console . error ( 'Stream timed out' );
break ;
case 'ABORTED' :
console . error ( 'Stream was cancelled' );
break ;
}
}
}
Keep extractors simple Complex parsing logic in push() blocks the stream. Do heavy processing after streaming completes.
Use appropriate margins XML extractors keep a safety margin to avoid splitting closing tags. Default is 10 characters.
Handle backpressure If your UI can’t keep up, chunks queue up in memory. Consider throttling or dropping old chunks.
Clean up resources Call extractor.reset() between conversations to clear buffers and state.
Architecture
Data Flow:
LLM Provider generates tokens via async iterator
Stream Extractor filters output, extracts streamable content, buffers for tag boundaries
SSE/WebSocket sends chunks to client progressively
UI updates in real-time as chunks arrive
Next Steps