Skip to main content

The Problem

Users hate waiting. A 3-second response feels like an eternity when you’re staring at a blank screen. Traditional request/response patterns make your agent feel sluggish, even when the LLM is fast. The UI waits for the entire response before showing anything.
Streaming changes everything. Users see tokens appear in real-time, making responses feel instant even when they take seconds to complete.

Quick Start

ElizaOS supports three response modes out of the box:
ModeLatencyUse Case
SyncWait for complete responseSimple integrations, batch processing
StreamTokens appear in real-timeChat UIs, interactive experiences
WebSocketBidirectional, persistentVoice conversations, multi-turn

HTTP Streaming

Send a message with stream: true to get Server-Sent Events:
const response = await fetch(`/api/agents/${agentId}/message`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    entityId: 'user-123',
    roomId: 'room-456',
    content: { text: 'Hello!', source: 'api' },
    stream: true // Enable streaming
  })
});

// Process SSE stream
const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = JSON.parse(line.slice(6));
    if (data.type === 'chunk') {
      process.stdout.write(data.text); // Display token immediately
    }
  }
}

WebSocket Connection

For bidirectional communication and voice conversations:
const socket = new WebSocket(`ws://localhost:3000/api/agents/${agentId}/ws`);

socket.onopen = () => {
  socket.send(JSON.stringify({
    type: 'message',
    entityId: 'user-123',
    roomId: 'room-456',
    content: { text: 'Hello!', source: 'websocket' }
  }));
};

socket.onmessage = (event) => {
  const data = JSON.parse(event.data);

  switch (data.type) {
    case 'chunk':
      process.stdout.write(data.text);
      break;
    case 'complete':
      console.log('\n--- Response complete ---');
      break;
    case 'error':
      console.error('Error:', data.message);
      break;
  }
};

Stream Events

The streaming API emits these event types:
EventDescription
chunkA token or text fragment to display
completeResponse finished, includes full text and actions
errorSomething went wrong
controlBackend control messages (typing indicators, etc.)

Chunk Event

{
  type: 'chunk',
  text: 'Hello',      // Text fragment to append
  timestamp: 1703001234567
}

Complete Event

{
  type: 'complete',
  text: 'Hello! How can I help you today?',  // Full response
  actions: ['REPLY'],                         // Executed actions
  messageId: 'msg-uuid',
  timestamp: 1703001234890
}

Custom Stream Extractors

ElizaOS uses stream extractors to filter LLM output for streaming. The framework provides several built-in extractors:

PassthroughExtractor

Streams everything as-is. Use for plain text responses.
import { PassthroughExtractor } from '@elizaos/core';

const extractor = new PassthroughExtractor();
extractor.push('Hello ');  // Returns: 'Hello '
extractor.push('world!');  // Returns: 'world!'

XmlTagExtractor

Extracts content from a specific XML tag. Use when LLM outputs structured XML.
import { XmlTagExtractor } from '@elizaos/core';

const extractor = new XmlTagExtractor('text');

// LLM output: <response><text>Hello world!</text></response>
extractor.push('<response><text>Hello ');  // Returns: 'Hel' (keeps margin)
extractor.push('world!</text></response>'); // Returns: 'lo world!'

ResponseStreamExtractor

Action-aware extraction used by DefaultMessageService. Understands <actions> to decide what to stream.
import { ResponseStreamExtractor } from '@elizaos/core';

const extractor = new ResponseStreamExtractor();

// Only streams <text> when action is REPLY
extractor.push('<actions>REPLY</actions><text>Hello!');  // Returns: 'Hel'
extractor.push('</text>');                               // Returns: 'lo!'

// Skips <text> when action is something else (action handler will respond)
extractor.push('<actions>SEARCH</actions><text>Ignored</text>');  // Returns: ''

Custom Extractor

Implement IStreamExtractor for custom filtering logic:
import type { IStreamExtractor } from '@elizaos/core';

class JsonValueExtractor implements IStreamExtractor {
  private buffer = '';
  private _done = false;

  get done() { return this._done; }

  push(chunk: string): string {
    this.buffer += chunk;

    // Try to parse and extract "response" field
    try {
      const json = JSON.parse(this.buffer);
      this._done = true;
      return json.response || '';
    } catch {
      return ''; // Wait for complete JSON
    }
  }
}

Stream Error Handling

The streaming system provides typed errors for robust handling:
import { StreamError } from '@elizaos/core';

try {
  const result = extractor.push(hugeChunk);
} catch (error) {
  if (StreamError.isStreamError(error)) {
    switch (error.code) {
      case 'CHUNK_TOO_LARGE':
        console.error('Chunk exceeded 1MB limit');
        break;
      case 'BUFFER_OVERFLOW':
        console.error('Buffer exceeded 100KB');
        break;
      case 'PARSE_ERROR':
        console.error('Malformed content');
        break;
      case 'TIMEOUT':
        console.error('Stream timed out');
        break;
      case 'ABORTED':
        console.error('Stream was cancelled');
        break;
    }
  }
}

Performance Tips

Keep extractors simple

Complex parsing logic in push() blocks the stream. Do heavy processing after streaming completes.

Use appropriate margins

XML extractors keep a safety margin to avoid splitting closing tags. Default is 10 characters.

Handle backpressure

If your UI can’t keep up, chunks queue up in memory. Consider throttling or dropping old chunks.

Clean up resources

Call extractor.reset() between conversations to clear buffers and state.

Architecture

Data Flow:
  1. LLM Provider generates tokens via async iterator
  2. Stream Extractor filters output, extracts streamable content, buffers for tag boundaries
  3. SSE/WebSocket sends chunks to client progressively
  4. UI updates in real-time as chunks arrive

Next Steps