Streaming Responses

The Problem

Users hate waiting. A 3-second response feels like an eternity when you’re staring at a blank screen. Traditional request/response patterns make your agent feel sluggish, even when the LLM is fast. The UI waits for the entire response before showing anything.

Streaming changes everything. Users see tokens appear in real-time, making responses feel instant even when they take seconds to complete.

Quick Start

ElizaOS supports three response modes out of the box:

Mode	Latency	Use Case
Sync	Wait for complete response	Simple integrations, batch processing
Stream	Tokens appear in real-time	Chat UIs, interactive experiences
WebSocket	Bidirectional, persistent	Voice conversations, multi-turn

HTTP Streaming

Send a message with stream: true to get Server-Sent Events:

const response = await fetch(`/api/agents/${agentId}/message`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    entityId: 'user-123',
    roomId: 'room-456',
    content: { text: 'Hello!', source: 'api' },
    stream: true // Enable streaming
  })
});

// Process SSE stream
const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = JSON.parse(line.slice(6));
    if (data.type === 'chunk') {
      process.stdout.write(data.text); // Display token immediately
    }
  }
}

WebSocket Connection

For bidirectional communication and voice conversations:

const socket = new WebSocket(`ws://localhost:3000/api/agents/${agentId}/ws`);

socket.onopen = () => {
  socket.send(JSON.stringify({
    type: 'message',
    entityId: 'user-123',
    roomId: 'room-456',
    content: { text: 'Hello!', source: 'websocket' }
  }));
};

socket.onmessage = (event) => {
  const data = JSON.parse(event.data);

  switch (data.type) {
    case 'chunk':
      process.stdout.write(data.text);
      break;
    case 'complete':
      console.log('\n--- Response complete ---');
      break;
    case 'error':
      console.error('Error:', data.message);
      break;
  }
};

Stream Events

The streaming API emits these event types:

Event	Description
`chunk`	A token or text fragment to display
`complete`	Response finished, includes full text and actions
`error`	Something went wrong
`control`	Backend control messages (typing indicators, etc.)

Chunk Event

{
  type: 'chunk',
  text: 'Hello',      // Text fragment to append
  timestamp: 1703001234567
}

Complete Event

{
  type: 'complete',
  text: 'Hello! How can I help you today?',  // Full response
  actions: ['REPLY'],                         // Executed actions
  messageId: 'msg-uuid',
  timestamp: 1703001234890
}

Custom Stream Extractors

ElizaOS uses stream extractors to filter LLM output for streaming. The framework provides several built-in extractors:

PassthroughExtractor

Streams everything as-is. Use for plain text responses.

import { PassthroughExtractor } from '@elizaos/core';

const extractor = new PassthroughExtractor();
extractor.push('Hello ');  // Returns: 'Hello '
extractor.push('world!');  // Returns: 'world!'

XmlTagExtractor

Extracts content from a specific XML tag. Use when LLM outputs structured XML.

import { XmlTagExtractor } from '@elizaos/core';

const extractor = new XmlTagExtractor('text');

// LLM output: <response><text>Hello world!</text></response>
extractor.push('<response><text>Hello ');  // Returns: 'Hel' (keeps margin)
extractor.push('world!</text></response>'); // Returns: 'lo world!'

ResponseStreamExtractor

Action-aware extraction used by DefaultMessageService. Understands <actions> to decide what to stream.

import { ResponseStreamExtractor } from '@elizaos/core';

const extractor = new ResponseStreamExtractor();

// Only streams <text> when action is REPLY
extractor.push('<actions>REPLY</actions><text>Hello!');  // Returns: 'Hel'
extractor.push('</text>');                               // Returns: 'lo!'

// Skips <text> when action is something else (action handler will respond)
extractor.push('<actions>SEARCH</actions><text>Ignored</text>');  // Returns: ''

Custom Extractor

Implement IStreamExtractor for custom filtering logic:

import type { IStreamExtractor } from '@elizaos/core';

class JsonValueExtractor implements IStreamExtractor {
  private buffer = '';
  private _done = false;

  get done() { return this._done; }

  push(chunk: string): string {
    this.buffer += chunk;

    // Try to parse and extract "response" field
    try {
      const json = JSON.parse(this.buffer);
      this._done = true;
      return json.response || '';
    } catch {
      return ''; // Wait for complete JSON
    }
  }
}

Stream Error Handling

The streaming system provides typed errors for robust handling:

import { StreamError } from '@elizaos/core';

try {
  const result = extractor.push(hugeChunk);
} catch (error) {
  if (StreamError.isStreamError(error)) {
    switch (error.code) {
      case 'CHUNK_TOO_LARGE':
        console.error('Chunk exceeded 1MB limit');
        break;
      case 'BUFFER_OVERFLOW':
        console.error('Buffer exceeded 100KB');
        break;
      case 'PARSE_ERROR':
        console.error('Malformed content');
        break;
      case 'TIMEOUT':
        console.error('Stream timed out');
        break;
      case 'ABORTED':
        console.error('Stream was cancelled');
        break;
    }
  }
}

Performance Tips

Keep extractors simple

Complex parsing logic in push() blocks the stream. Do heavy processing after streaming completes.

Use appropriate margins

XML extractors keep a safety margin to avoid splitting closing tags. Default is 10 characters.

Handle backpressure

If your UI can’t keep up, chunks queue up in memory. Consider throttling or dropping old chunks.

Clean up resources

Call extractor.reset() between conversations to clear buffers and state.

Architecture

Data Flow:

LLM Provider generates tokens via async iterator
Stream Extractor filters output, extracts streamable content, buffers for tag boundaries
SSE/WebSocket sends chunks to client progressively
UI updates in real-time as chunks arrive

Next Steps

Message Service

Learn how DefaultMessageService uses streaming internally

Model Types

Configure streaming behavior per model type

WebSocket API

Full WebSocket API reference

Types Reference

Complete streaming type definitions

GETTING STARTED

GUIDES

PROJECTS

AGENTS

PLUGINS

RUNTIME

TOKEN

The Problem

Quick Start

HTTP Streaming

WebSocket Connection

Stream Events

Chunk Event

Complete Event

Custom Stream Extractors

PassthroughExtractor

XmlTagExtractor

ResponseStreamExtractor

Custom Extractor

Stream Error Handling

Performance Tips

Keep extractors simple

Use appropriate margins

Handle backpressure

Clean up resources

Architecture

Next Steps

Message Service

Model Types

WebSocket API

Types Reference

GETTING STARTED

GUIDES

PROJECTS

AGENTS

PLUGINS

RUNTIME

TOKEN

​The Problem

​Quick Start

​HTTP Streaming

​WebSocket Connection

​Stream Events

​Chunk Event

​Complete Event

​Custom Stream Extractors

​PassthroughExtractor

​XmlTagExtractor

​ResponseStreamExtractor

​Custom Extractor

​Stream Error Handling

​Performance Tips

Keep extractors simple

Use appropriate margins

Handle backpressure

Clean up resources

​Architecture

​Next Steps

Message Service

Model Types

WebSocket API

Types Reference

The Problem

Quick Start

HTTP Streaming

WebSocket Connection

Stream Events

Chunk Event

Complete Event

Custom Stream Extractors

PassthroughExtractor

XmlTagExtractor

ResponseStreamExtractor

Custom Extractor

Stream Error Handling

Performance Tips

Architecture

Next Steps