Skip to content

Ollama (local LLM)

Ollama runs open-source models locally on your machine and exposes an OpenAI-compatible HTTP API. The FCC Ollama plugin makes it discoverable as a first-class provider in the framework.

This is the recommended provider for:

  • Personal development and experimentation
  • Educational settings (no API key required, no usage costs)
  • Privacy-sensitive workflows (no data leaves your machine)
  • Reproducible research (pin a model tag)

5-minute walkthrough

1. Install Ollama

Follow the Ollama install guide for your OS. On Linux:

curl -fsSL https://ollama.com/install.sh | sh

2. Pull a model

ollama pull llama3.2:latest

Other recommended starter models:

Model Size Use case
llama3.2:latest ~2 GB Fast, general-purpose
llama3.1:8b ~5 GB Higher quality
phi4:latest ~9 GB Strong reasoning, smaller than Llama
qwen2.5:7b ~5 GB Multilingual
mistral:latest ~4 GB Good code/text balance

3. Install the FCC Ollama plugin

pip install -e ./plugins/fcc-ollama-plugin

(or make install-dev, which installs all bundled plugins automatically)

4. Tell FCC where Ollama lives

export OLLAMA_BASE_URL=http://localhost:11434/v1
export OLLAMA_DEFAULT_MODEL=llama3.2:latest
export FCC_DEFAULT_PROVIDER=ollama

Or copy them into a .env file in your project root — FCC reads it automatically via python-dotenv.

5. Run a scenario with Ollama

fcc scenarios run --scenario basic_routing

You should see live token output from Ollama in the simulation trace.

Programmatic use

from fcc.simulation.ai_client import AIClient

client = AIClient(provider="ollama")
response = client.complete_simple(
    system_prompt="You are a helpful assistant.",
    user_message="What is the capital of France?",
)
print(response.content)
print(f"Tokens: {response.usage}")

Per-persona model selection

Different personas can use different models within the same simulation by setting preferred_model in the persona's prompt YAML:

# src/fcc/data/prompts/core.yaml
persona_prompts:
  RC:
    system_suffix: "You are the Research Crafter."
    preferred_model: "llama3.1:8b"   # Big model for research
    temperature: 0.3
  DE:
    system_suffix: "You are the Documentation Evangelist."
    preferred_model: "llama3.2:latest"  # Faster model for docs
    temperature: 0.5

Auto-detection rule

The Ollama plugin only opts in when OLLAMA_BASE_URL is explicitly set in your environment. It does NOT probe localhost:11434 automatically — this is a deliberate v1.1.0 design choice so that having Ollama installed on your machine does not silently change FCC's default behavior.

To enable Ollama auto-detection without editing each command:

echo 'export OLLAMA_BASE_URL=http://localhost:11434/v1' >> ~/.bashrc

Inside Docker

The bundled docker/Dockerfile.backend ships with the Ollama plugin already installed via the [full] umbrella dependency group. To point the containerized backend at an Ollama instance running on the host:

# docker-compose.yml override
services:
  backend:
    environment:
      OLLAMA_BASE_URL: http://host.docker.internal:11434/v1
      OLLAMA_DEFAULT_MODEL: llama3.2:latest
      FCC_DEFAULT_PROVIDER: ollama

(host.docker.internal resolves to the host on Docker Desktop and recent Docker Engine releases. On older Linux Docker, use the host's LAN IP instead.)

Troubleshooting

FCC keeps falling back to mock
Confirm OLLAMA_BASE_URL is set: echo $OLLAMA_BASE_URL. If empty, the plugin opts out of auto-detection. Set it explicitly.
ConnectionError: [Errno 111] Connection refused
Ollama isn't running. Start it: ollama serve (or check the systemd service: systemctl status ollama).
model not found
Pull the model first: ollama pull llama3.2:latest.
Slow first call
Ollama loads the model into memory on the first request. Subsequent calls are much faster. Use ollama run llama3.2 "" to pre-warm.

See also