Features
- Local execution - No API keys or internet required
- Multiple models - Support for Llama, Mistral, Gemma, and more
- Full model types - Text, embeddings, and objects
- Cost-free - No API charges
- Fallback option - Can serve as a local fallback when cloud providers are unavailable
Prerequisites
- Install Ollama
- Pull desired models:
Installation
Configuration
Environment Variables
Character Configuration
Supported Operations
| Operation | Models | Notes |
|---|---|---|
| TEXT_GENERATION | llama3, mistral, gemma | Various sizes available |
| EMBEDDING | nomic-embed-text, mxbai-embed-large | Local embeddings |
| OBJECT_GENERATION | All text models | JSON generation |
Model Configuration
The plugin uses three model tiers:- SMALL_MODEL: Quick responses, lower resource usage
- MEDIUM_MODEL: Balanced performance
- LARGE_MODEL: Best quality, highest resource needs
- Llama models (3, 3.1, 3.2, 3.3)
- Mistral/Mixtral models
- Gemma models
- Phi models
- Any custom models you’ve created
nomic-embed-text- Balanced performancemxbai-embed-large- Higher qualityall-minilm- Lightweight option
Performance Tips
- GPU Acceleration - Dramatically improves speed
- Model Quantization - Use Q4/Q5 versions for better performance
- Context Length - Limit context for faster responses
Hardware Requirements
| Model Size | RAM Required | GPU Recommended |
|---|---|---|
| 7B | 8GB | Optional |
| 13B | 16GB | Yes |
| 70B | 64GB+ | Required |
Common Issues
”Connection refused”
Ensure Ollama is running:Slow Performance
- Use smaller models or quantized versions
- Enable GPU acceleration
- Reduce context length

