Ollama (local LLM)¶

Ollama runs open-source models locally on your machine and exposes an OpenAI-compatible HTTP API. The FCC Ollama plugin makes it discoverable as a first-class provider in the framework.

This is the recommended provider for:

Personal development and experimentation
Educational settings (no API key required, no usage costs)
Privacy-sensitive workflows (no data leaves your machine)
Reproducible research (pin a model tag)

5-minute walkthrough¶

1. Install Ollama¶

Follow the Ollama install guide for your OS. On Linux:

curl -fsSL https://ollama.com/install.sh | sh

2. Pull a model¶

ollama pull llama3.2:latest

Other recommended starter models:

Model	Size	Use case
`llama3.2:latest`	~2 GB	Fast, general-purpose
`llama3.1:8b`	~5 GB	Higher quality
`phi4:latest`	~9 GB	Strong reasoning, smaller than Llama
`qwen2.5:7b`	~5 GB	Multilingual
`mistral:latest`	~4 GB	Good code/text balance

3. Install the FCC Ollama plugin¶

pip install -e ./plugins/fcc-ollama-plugin

(or make install-dev, which installs all bundled plugins automatically)

4. Tell FCC where Ollama lives¶

export OLLAMA_BASE_URL=http://localhost:11434/v1
export OLLAMA_DEFAULT_MODEL=llama3.2:latest
export FCC_DEFAULT_PROVIDER=ollama

Or copy them into a .env file in your project root — FCC reads it automatically via python-dotenv.

5. Run a scenario with Ollama¶

fcc scenarios run --scenario basic_routing

You should see live token output from Ollama in the simulation trace.

Programmatic use¶

from fcc.simulation.ai_client import AIClient

client = AIClient(provider="ollama")
response = client.complete_simple(
    system_prompt="You are a helpful assistant.",
    user_message="What is the capital of France?",
)
print(response.content)
print(f"Tokens: {response.usage}")

Per-persona model selection¶

Different personas can use different models within the same simulation by setting preferred_model in the persona's prompt YAML:

# src/fcc/data/prompts/core.yaml
persona_prompts:
  RC:
    system_suffix: "You are the Research Crafter."
    preferred_model: "llama3.1:8b"   # Big model for research
    temperature: 0.3
  DE:
    system_suffix: "You are the Documentation Evangelist."
    preferred_model: "llama3.2:latest"  # Faster model for docs
    temperature: 0.5

Auto-detection rule¶

The Ollama plugin only opts in when OLLAMA_BASE_URL is explicitly set in your environment. It does NOT probe localhost:11434 automatically — this is a deliberate v1.1.0 design choice so that having Ollama installed on your machine does not silently change FCC's default behavior.

To enable Ollama auto-detection without editing each command:

echo 'export OLLAMA_BASE_URL=http://localhost:11434/v1' >> ~/.bashrc

Inside Docker¶

The bundled docker/Dockerfile.backend ships with the Ollama plugin already installed via the [full] umbrella dependency group. To point the containerized backend at an Ollama instance running on the host:

# docker-compose.yml override
services:
  backend:
    environment:
      OLLAMA_BASE_URL: http://host.docker.internal:11434/v1
      OLLAMA_DEFAULT_MODEL: llama3.2:latest
      FCC_DEFAULT_PROVIDER: ollama

(host.docker.internal resolves to the host on Docker Desktop and recent Docker Engine releases. On older Linux Docker, use the host's LAN IP instead.)

Troubleshooting¶

FCC keeps falling back to mock: Confirm OLLAMA_BASE_URL is set: echo $OLLAMA_BASE_URL. If empty, the plugin opts out of auto-detection. Set it explicitly.
ConnectionError: [Errno 111] Connection refused: Ollama isn't running. Start it: ollama serve (or check the systemd service: systemctl status ollama).
model not found: Pull the model first: ollama pull llama3.2:latest.
Slow first call: Ollama loads the model into memory on the first request. Subsequent calls are much faster. Use ollama run llama3.2 "" to pre-warm.