Contextual Embeddings

Contextual embeddings are an advanced Knowledge plugin feature that improves retrieval accuracy by enriching text chunks with surrounding context before generating embeddings. This implementation is based on Anthropic’s contextual retrieval techniques.

What are Contextual Embeddings?

Traditional RAG systems embed isolated text chunks, losing important context. Contextual embeddings solve this by using an LLM to add relevant context to each chunk before embedding.

Traditional vs Contextual

Traditional Embedding
Contextual Embedding

Original chunk:
"The deployment process requires authentication."

Embedded as-is, missing context about:
- Which deployment process?
- What kind of authentication?
- For which system?

Enriched chunk:
"In the Kubernetes deployment section for the payment service, 
the deployment process requires authentication using OAuth2 
tokens obtained from the identity provider."

Now embeddings understand this is about:
- Kubernetes deployments
- Payment service specifically
- OAuth2 authentication

How It Works

The Knowledge plugin uses a sophisticated prompt-based approach to enrich chunks:

Document Analysis: The full document is passed to an LLM along with each chunk
Context Generation: The LLM identifies relevant context from the document
Chunk Enrichment: The original chunk is preserved with added context
Embedding: The enriched chunk is embedded using your configured embedding model

The implementation is based on Anthropic’s Contextual Retrieval cookbook example, which showed up to 50% improvement in retrieval accuracy.

Configuration

Enable Contextual Embeddings

.env

# Enable contextual embeddings
CTX_KNOWLEDGE_ENABLED=true

# Configure your text generation provider
TEXT_PROVIDER=openrouter  # or openai, anthropic, google
TEXT_MODEL=anthropic/claude-3-haiku  # or any supported model

# Required API keys
OPENROUTER_API_KEY=your-key  # If using OpenRouter
# or
OPENAI_API_KEY=your-key      # If using OpenAI
# or
ANTHROPIC_API_KEY=your-key   # If using Anthropic
# or  
GOOGLE_API_KEY=your-key      # If using Google

Important: Embeddings always use the model configured in useModel(TEXT_EMBEDDING) from your agent setup. Do NOT try to mix different embedding models - all your documents must use the same embedding model for consistency.

OpenRouter Standalone Setup

OpenRouter now supports embeddings natively with multiple models (text-embedding-3-large, qwen3-embedding, gemini-embedding, mistral-embed):

OpenRouter Only
OpenRouter + Local Fallback

character.ts

export const character = {
  name: 'MyAgent',
  plugins: [
    '@elizaos/plugin-openrouter',   // Handles both text and embeddings
    '@elizaos/plugin-knowledge',    // Knowledge plugin
  ],
};

.env

# Enable contextual embeddings
CTX_KNOWLEDGE_ENABLED=true

# OpenRouter handles both text generation and embeddings
OPENROUTER_API_KEY=your-openrouter-key
OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-large  # Optional

character.ts

export const character = {
  name: 'MyAgent',
  plugins: [
    '@elizaos/plugin-openrouter',  // Cloud text & embeddings
    '@elizaos/plugin-ollama',       // Offline fallback
    '@elizaos/plugin-knowledge',    // Knowledge plugin
  ],
};

.env

# Enable contextual embeddings
CTX_KNOWLEDGE_ENABLED=true

# OpenRouter for cloud
OPENROUTER_API_KEY=your-openrouter-key

# Ollama as fallback for offline use
OLLAMA_API_ENDPOINT=http://localhost:11434/api

Alternative Providers

OpenAI Only
Anthropic + OpenAI
Google Only
OpenRouter + Google

CTX_KNOWLEDGE_ENABLED=true
TEXT_PROVIDER=openai
TEXT_MODEL=gpt-4o-mini
OPENAI_API_KEY=your-key

CTX_KNOWLEDGE_ENABLED=true
TEXT_PROVIDER=anthropic
TEXT_MODEL=claude-3-haiku-20240307
ANTHROPIC_API_KEY=your-anthropic-key
OPENAI_API_KEY=your-openai-key  # Still needed for embeddings

CTX_KNOWLEDGE_ENABLED=true
TEXT_PROVIDER=google
TEXT_MODEL=gemini-1.5-flash
GOOGLE_API_KEY=your-google-key
# Google embeddings will be used automatically

CTX_KNOWLEDGE_ENABLED=true
TEXT_PROVIDER=openrouter
TEXT_MODEL=anthropic/claude-3-haiku
OPENROUTER_API_KEY=your-openrouter-key
GOOGLE_API_KEY=your-google-key  # For embeddings
# Requires @elizaos/plugin-google for embeddings

Technical Details

Chunk Processing

The plugin uses fixed chunk sizes optimized for contextual enrichment:

Chunk Size: 500 tokens (approximately 1,750 characters)
Chunk Overlap: 100 tokens
Context Target: 60-200 tokens of added context

These values are based on research showing that smaller chunks with rich context perform better than larger chunks without context.

Content-Aware Templates

The plugin automatically detects content types and uses specialized prompts:

// Different templates for different content types
- General text documents
- PDF documents (with special handling for corrupted text)
- Mathematical content (preserves equations and notation)
- Code files (includes imports, function signatures)
- Technical documentation (preserves terminology)

OpenRouter Caching

When using OpenRouter with Claude or Gemini models, the plugin automatically leverages caching:

First document chunk: Caches the full document
Subsequent chunks: Reuses cached document (90% cost reduction)
Cache duration: 5 minutes (automatic)

This means processing a 100-page document costs almost the same as processing a single page!

Example: How Context Improves Retrieval

Without Contextual Embeddings

Query: "How do I configure the timeout?"

Retrieved chunk:
"Set the timeout value to 30 seconds."

Problem: Which timeout? Database? API? Cache?

With Contextual Embeddings

Query: "How do I configure the timeout?"

Retrieved chunk:
"In the Redis configuration section, when setting up the caching layer,
set the timeout value to 30 seconds for optimal performance with 
session data."

Result: Clear understanding this is about Redis cache timeout.

Performance Considerations

Processing Time

Initial processing: 1-3 seconds per chunk (includes LLM call)
With caching: 0.1-0.3 seconds per chunk
Batch processing: Up to 30 chunks concurrently

Cost Estimation

Document Size	Pages	Chunks	Without Caching	With OpenRouter Cache
Small	10	~20	$0.02	$0.002
Medium	50	~100	$0.10	$0.01
Large	200	~400	$0.40	$0.04

Costs are estimates based on Claude 3 Haiku pricing. Actual costs depend on your chosen model.

Monitoring

The plugin provides detailed logging:

# Enable debug logging to see enrichment details
LOG_LEVEL=debug elizaos start

This will show:

Context enrichment progress
Cache hit/miss rates
Processing times per document
Token usage

Common Issues and Solutions

Context Not Being Added

Check if contextual embeddings are enabled:

# Look for this in your logs:
"CTX enrichment ENABLED"
# or
"CTX enrichment DISABLED"

Verify your configuration:

CTX_KNOWLEDGE_ENABLED=true (not “TRUE” or “True”)
TEXT_PROVIDER and TEXT_MODEL are both set
Required API key for your provider is set

Slow Processing

Solutions:

Use OpenRouter with Claude/Gemini for automatic caching
Process smaller batches of documents
Use faster models (Claude 3 Haiku, Gemini 1.5 Flash)

High Costs

Solutions:

Enable OpenRouter caching (90% cost reduction)
Use smaller models for context generation
Process documents in batches during off-peak hours

Best Practices

Use OpenRouter for Cost Efficiency

OpenRouter’s caching makes contextual embeddings 90% cheaper when processing multiple chunks from the same document.

Keep Default Settings

The chunk sizes and overlap are optimized based on research. Only change if you have specific requirements.

Monitor Your Logs

Enable debug logging when first setting up to ensure context is being added properly.

Use Appropriate Models

Claude 3 Haiku: Best balance of quality and cost
Gemini 1.5 Flash: Fastest processing
GPT-4o-mini: Good quality, moderate cost

Summary

Contextual embeddings significantly improve retrieval accuracy by:

Adding document context to each chunk before embedding
Using intelligent templates based on content type
Preserving the original text while enriching with context
Leveraging caching for cost-efficient processing

The implementation is based on Anthropic’s proven approach and integrates seamlessly with elizaOS’s existing infrastructure. Simply set CTX_KNOWLEDGE_ENABLED=true and configure your text generation provider to get started!

Getting Started

Core Plugins

LLM Providers

Platform Integrations

DeFi Plugins

What are Contextual Embeddings?

Traditional vs Contextual

How It Works

Configuration

Enable Contextual Embeddings

OpenRouter Standalone Setup

Alternative Providers

Technical Details

Chunk Processing

Content-Aware Templates

OpenRouter Caching

Example: How Context Improves Retrieval

Without Contextual Embeddings

With Contextual Embeddings

Performance Considerations

Processing Time

Cost Estimation

Monitoring

Common Issues and Solutions

Context Not Being Added

Slow Processing

High Costs

Best Practices

Summary

Getting Started

Core Plugins

LLM Providers

Platform Integrations

DeFi Plugins

​What are Contextual Embeddings?

​Traditional vs Contextual

​How It Works

​Configuration

​Enable Contextual Embeddings

​OpenRouter Standalone Setup

​Alternative Providers

​Technical Details

​Chunk Processing

​Content-Aware Templates

​OpenRouter Caching

​Example: How Context Improves Retrieval

​Without Contextual Embeddings

​With Contextual Embeddings

​Performance Considerations

​Processing Time

​Cost Estimation

​Monitoring

​Common Issues and Solutions

​Context Not Being Added

​Slow Processing

​High Costs

​Best Practices

​Summary

What are Contextual Embeddings?

Traditional vs Contextual

How It Works

Configuration

Enable Contextual Embeddings

OpenRouter Standalone Setup

Alternative Providers

Technical Details

Chunk Processing

Content-Aware Templates

OpenRouter Caching

Example: How Context Improves Retrieval

Without Contextual Embeddings

With Contextual Embeddings

Performance Considerations

Processing Time

Cost Estimation

Monitoring

Common Issues and Solutions

Context Not Being Added

Slow Processing

High Costs

Best Practices

Summary