Overview
This template provides a production‑ready Ollama instance as a Monk runnable. You can:- Run it directly to get a managed Ollama server for local LLM inference
- Inherit it in your own AI applications to add language model capabilities
What this template manages
- Ollama container (
ollama/ollamaimage) - REST API service on port 11434
- Model storage and caching
- GPU acceleration support (optional)
- Multiple model management
Quick start (run directly)
- Load templates
- Run Ollama with defaults
- Pull and run a model
localhost:11434 (or the runnable hostname inside Monk networks).
Configuration
Key variables you can customize in this template:${monk-volume-path}/ollama on the host.
Use by inheritance (recommended for AI apps)
Inherit the Ollama runnable in your application and declare a connection. Example:Ports and connectivity
- Service:
ollamaon TCP port11434 - From other runnables in the same process group, use
connection-hostname("\<connection-name>")to resolve the Ollama host.
Persistence and configuration
- Models path:
${monk-volume-path}/ollama:/root/.ollama - Downloaded models are cached and reused across restarts
Features
- Run LLMs locally without cloud dependencies
- Multiple model support (Llama 2, Mistral, Code Llama, etc.)
- Simple REST API
- Model customization and fine-tuning
- GPU acceleration (CUDA, Metal)
- Streaming responses
- Model library and registry
Available Models
Popular models you can run:llama2- Meta’s Llama 2 (7B, 13B, 70B)mistral- Mistral 7Bcodellama- Code Llama for code generationphi- Microsoft Phi-2vicuna- Vicuna chat model- And many more at ollama.ai/library
Use cases
Ollama excels at:- Local AI assistants
- Code generation and completion
- Document summarization
- Question answering systems
- Text classification
- Privacy-focused AI applications
Related templates
- Combine with vector databases (
qdrant/,chroma/) for RAG - Use with
langfuse/for LLM observability and tracing - Integrate with application frameworks for AI-powered apps
Troubleshooting
- List available models:
- Check Ollama status:
- Check logs:
- For GPU support, ensure NVIDIA drivers and Docker GPU runtime are installed.
- Large models (70B+) require significant RAM/VRAM.
- First model download can take several minutes depending on model size.