Skip to main content

Usage

elizaos scenario <subcommand> [options]
The scenario command provides a comprehensive framework for defining, executing, and evaluating agent behavior through structured test scenarios.

Subcommands

SubcommandDescription
runExecute a single scenario from a YAML file
matrixExecute a scenario matrix for parameter exploration

run

Execute a scenario defined in a YAML file.

Usage

elizaos scenario run <filePath> [options]

Arguments

ArgumentDescription
<filePath>Path to the .scenario.yaml file

Options

OptionDescriptionDefault
-l, --liveRun in live mode, ignoring mocksfalse

Examples

# Run a scenario
elizaos scenario run ./tests/greeting.scenario.yaml

# Run in live mode (no mocking)
elizaos scenario run ./tests/api-test.scenario.yaml --live

matrix

Execute a scenario matrix for exploring parameter combinations.

Usage

elizaos scenario matrix <configPath> [options]

Arguments

ArgumentDescription
<configPath>Path to the matrix configuration YAML file

Options

OptionDescriptionDefault
--dry-runShow matrix analysis without executingfalse
--parallel <number>Maximum parallel test runs1
--filter <pattern>Filter parameter combinations by pattern-
--verboseShow detailed progress informationfalse

Examples

# Analyze matrix without executing
elizaos scenario matrix ./matrix-config.yaml --dry-run

# Execute matrix with parallel runs
elizaos scenario matrix ./matrix-config.yaml --parallel 4

# Filter specific combinations
elizaos scenario matrix ./matrix-config.yaml --filter "model=gpt-4"

# Verbose execution
elizaos scenario matrix ./matrix-config.yaml --verbose

Scenario YAML Format

Basic Structure

name: greeting-test
description: Test agent greeting behavior

setup:
  mocks:
    - type: llm
      response: "Hello! How can I help you today?"

run:
  - action: send_message
    content: "Hello"
    evaluations:
      - type: string_contains
        value: "Hello"

judgment:
  strategy: all_pass

Scenario Fields

FieldDescriptionRequired
nameScenario nameYes
descriptionScenario descriptionNo
pluginsList of plugins to loadNo
setupSetup configuration (mocks, files)No
runList of execution stepsYes
judgmentHow to determine pass/failNo

Evaluation Types

TypeDescription
string_containsCheck if output contains a string
regex_matchMatch output against regex pattern
llm_evaluationUse LLM to evaluate response quality

Judgment Strategies

StrategyDescription
all_passAll evaluations must pass
any_passAt least one evaluation must pass

Matrix Configuration

Basic Structure

name: model-comparison
description: Compare agent behavior across models

base_scenario: ./base.scenario.yaml
runs_per_combination: 3

matrix:
  - parameter: setup.mocks[0].model
    values:
      - gpt-4
      - gpt-3.5-turbo
      - claude-3-opus

  - parameter: run[0].content
    values:
      - "Hello"
      - "Hi there"
      - "Good morning"

Matrix Fields

FieldDescriptionRequired
nameMatrix nameYes
descriptionMatrix descriptionNo
base_scenarioPath to base scenario fileYes
runs_per_combinationRuns per parameter combinationYes
matrixList of parameter axesYes

Output Structure

Scenario runs generate output in the _logs_ directory:
_logs_/
├── run-001-execution-0.json    # Execution result step 0
├── run-001-evaluation-0.json   # Evaluation result step 0
├── run-001.json                # Centralized run result
└── matrix-YYYYMMDD-HHMM/       # Matrix run output
    ├── run-001.json
    ├── run-002.json
    └── ...

Mocking

Scenarios support mocking for deterministic testing:
setup:
  mocks:
    - type: llm
      model: gpt-4
      response: "Mocked response"

    - type: action
      name: SEND_MESSAGE
      result:
        success: true
        message: "Mocked action result"

Plugins

Specify plugins to load for the scenario:
plugins:
  - @elizaos/plugin-bootstrap
  - @elizaos/plugin-sql
  - name: @elizaos/plugin-discord
    enabled: false  # Disable specific plugin
Default plugins (plugin-sql, plugin-bootstrap, plugin-openai) are always loaded.
  • report: Generate reports from scenario output
  • test: Run project tests