Skip to content

Government vertical — scientific tutorial

Released in FCC v1.2.0. You are running controlled experiments on LLM behavior in the Government domain. This tutorial shows how to instrument a scenario with CLEAR+ benchmarks, swap providers via the ai_config scenario override, and measure risk-classification stability across runs.

The Government pack in one paragraph

The government vertical pack (at src/fcc/data/verticals/government.yaml) contains 6 personas for open data stewardship (DCAT-US 3.0), NIEM information exchange, FedRAMP compliance, privacy impact assessment, civic service research, and zero-trust identity architecture. Headline compliance frameworks: DCAT-US 3.0, NIEM 6.0, FedRAMP Rev 5, NIST SP 800-53 Rev 5, OMB M-22-09.

Focus persona: NIE — NIEM Information Exchange Architect

We'll anchor this tutorial on NIE, because it's the one most relevant to the scientific audience in the Government domain.

from fcc.verticals.registry import VerticalRegistry

reg = VerticalRegistry.from_builtin()
pack = reg.get("government")
persona = next(p for p in pack.personas if p.id == "NIE")

print(persona.name)
print(persona.risk_category or "minimal")
riscear = persona.riscear or {}
print("Archetype:", riscear.get("archetype"))
print("Role:", riscear.get("role"))

Experiment design

You want to answer questions like "does swapping Anthropic for Ollama change how NIE classifies risk?" or "does LiteLLM routing add latency variance I should report?"

The v1.1.0+ ai_config scenario override lets you pin provider/model per scenario without touching the YAML:

# scenarios/government_rct.yaml
scenario_id: GOV-RCT
ai_config:
  provider: litellm
  model: ollama/llama3.2
  temperature: 0.0
  max_tokens: 2000

Then run the CLEAR+ benchmark runner in --mock mode first to get a deterministic baseline:

fcc benchmark run --scenario GOV-RCT --mock --output _output/benchmarks/baseline.json

Then swap to a real provider and compare:

fcc benchmark run --scenario GOV-RCT --output _output/benchmarks/live.json
fcc benchmark compare baseline live

Stable risk classification under model swaps

The AIActClassifier in FCC is deterministic — it doesn't call the LLM. But persona outputs change across providers, so downstream classifiers that inspect outputs may drift.

from fcc.compliance.classifier import AIActClassifier

classifier = AIActClassifier()
# Stable across runs because the override is data-driven, not model-driven:
risk = classifier.classify_persona(persona, vertical_domain="government")
assert risk.value in {"minimal", "limited", "high", "unacceptable"}

This gives you a ground-truth label you can use as a reference in your experiments.

Verify what you did

Run the vertical test suite to make sure your changes didn't break anything:

pytest tests/test_verticals.py -k "government" -v

All scientific-path steps in this tutorial leave your working tree unchanged — the pack YAML is read-only from your perspective. The only state that accumulates is in _output/ (scenario run traces) and docs/model-cards/ (if you regenerated cards).

Next steps