Ollama (local LLM)¶
Ollama runs open-source models locally on your machine and exposes an OpenAI-compatible HTTP API. The FCC Ollama plugin makes it discoverable as a first-class provider in the framework.
This is the recommended provider for:
- Personal development and experimentation
- Educational settings (no API key required, no usage costs)
- Privacy-sensitive workflows (no data leaves your machine)
- Reproducible research (pin a model tag)
5-minute walkthrough¶
1. Install Ollama¶
Follow the Ollama install guide for your OS. On Linux:
2. Pull a model¶
Other recommended starter models:
| Model | Size | Use case |
|---|---|---|
llama3.2:latest |
~2 GB | Fast, general-purpose |
llama3.1:8b |
~5 GB | Higher quality |
phi4:latest |
~9 GB | Strong reasoning, smaller than Llama |
qwen2.5:7b |
~5 GB | Multilingual |
mistral:latest |
~4 GB | Good code/text balance |
3. Install the FCC Ollama plugin¶
(or make install-dev, which installs all bundled plugins automatically)
4. Tell FCC where Ollama lives¶
export OLLAMA_BASE_URL=http://localhost:11434/v1
export OLLAMA_DEFAULT_MODEL=llama3.2:latest
export FCC_DEFAULT_PROVIDER=ollama
Or copy them into a .env file in your project root — FCC reads it
automatically via python-dotenv.
5. Run a scenario with Ollama¶
You should see live token output from Ollama in the simulation trace.
Programmatic use¶
from fcc.simulation.ai_client import AIClient
client = AIClient(provider="ollama")
response = client.complete_simple(
system_prompt="You are a helpful assistant.",
user_message="What is the capital of France?",
)
print(response.content)
print(f"Tokens: {response.usage}")
Per-persona model selection¶
Different personas can use different models within the same simulation
by setting preferred_model in the persona's prompt YAML:
# src/fcc/data/prompts/core.yaml
persona_prompts:
RC:
system_suffix: "You are the Research Crafter."
preferred_model: "llama3.1:8b" # Big model for research
temperature: 0.3
DE:
system_suffix: "You are the Documentation Evangelist."
preferred_model: "llama3.2:latest" # Faster model for docs
temperature: 0.5
Auto-detection rule¶
The Ollama plugin only opts in when OLLAMA_BASE_URL is explicitly set
in your environment. It does NOT probe localhost:11434 automatically —
this is a deliberate v1.1.0 design choice so that having Ollama installed
on your machine does not silently change FCC's default behavior.
To enable Ollama auto-detection without editing each command:
Inside Docker¶
The bundled docker/Dockerfile.backend ships with the Ollama plugin
already installed via the [full] umbrella dependency group. To point
the containerized backend at an Ollama instance running on the host:
# docker-compose.yml override
services:
backend:
environment:
OLLAMA_BASE_URL: http://host.docker.internal:11434/v1
OLLAMA_DEFAULT_MODEL: llama3.2:latest
FCC_DEFAULT_PROVIDER: ollama
(host.docker.internal resolves to the host on Docker Desktop and
recent Docker Engine releases. On older Linux Docker, use the host's
LAN IP instead.)
Troubleshooting¶
- FCC keeps falling back to mock
- Confirm
OLLAMA_BASE_URLis set:echo $OLLAMA_BASE_URL. If empty, the plugin opts out of auto-detection. Set it explicitly. ConnectionError: [Errno 111] Connection refused- Ollama isn't running. Start it:
ollama serve(or check the systemd service:systemctl status ollama). model not found- Pull the model first:
ollama pull llama3.2:latest. - Slow first call
- Ollama loads the model into memory on the first request. Subsequent
calls are much faster. Use
ollama run llama3.2 ""to pre-warm.
See also¶
- LiteLLM — if you want to switch between Ollama and other backends without changing FCC code
- Provider matrix — all supported providers
- Ollama documentation — full OpenAI compatibility reference