Chapter 4: Simulation and Traces¶
Learning Objectives¶
By the end of this chapter you will be able to:
- Run simulations in both mock and AI-powered modes.
- Configure simulation budgets to control cost and token usage.
- Read and interpret simulation traces for debugging and analysis.
- Replay traces to reproduce past simulation runs.
- Compare traces across different configurations to evaluate quality.
The sequence diagram below walks through a single simulation run, showing how SimulationEngine traverses the workflow graph, activates personas via ActionEngine, and records each step on the trace.
sequenceDiagram
participant S as Scenario
participant E as SimulationEngine
participant G as WorkflowGraph
participant P as Persona (via ActionEngine)
participant T as Trace
S->>E: run(scenario)
E->>G: load graph
loop For each node in BFS order
G->>E: next node
E->>P: activate(persona_id, action_type)
P-->>E: ActionResult
E->>T: record step
alt Feedback edge triggered
E->>G: re-queue upstream node
end
end
E-->>S: SimulationResult + Trace
Because the trace is append-only and deterministic under mock mode, two runs of the same scenario produce byte-identical traces — useful when diffing configuration changes or regression-testing new personas.
What Is a Simulation?¶
A simulation is a complete execution of a workflow graph. The simulation engine traverses the graph from start to finish, activating personas at each node, collecting outputs, and following forward and feedback edges until all nodes have been processed or the iteration limit is reached.
The engine supports two modes:
- Mock mode: Deterministic outputs generated from templates. No API calls, no cost, instant execution. Use this for development, testing, and CI pipelines.
- AI mode: Real outputs generated by LLM providers (Anthropic, OpenAI). Rich, varied outputs but requires API keys and incurs cost. Use this for production simulations and quality evaluation.
Running a Mock Simulation¶
from fcc.simulation.engine import SimulationEngine
from fcc.scenarios.loader import ScenarioLoader
loader = ScenarioLoader()
scenario = loader.load("competitive_analysis")
engine = SimulationEngine(mode="mock")
result = engine.run(scenario)
print(f"Status: {result.status}")
print(f"Steps: {len(result.trace.steps)}")
print(f"Duration: {result.duration_ms}ms")
Mock simulations are deterministic: the same scenario with the same configuration always produces the same trace. This is critical for testing -- you can assert on specific output content without worrying about LLM variability.
Mock Output Templates¶
Mock outputs are generated from templates that combine the persona's R.I.S.C.E.A.R. specification with the action type's expected output format. The templates produce plausible-looking outputs that follow the persona's style constraints. They are not intelligent -- they are structured fill-in-the-blank documents -- but they are good enough for testing workflow logic, governance rules, and collaboration flows.
Running an AI Simulation¶
engine = SimulationEngine(
mode="ai",
provider="anthropic",
model="claude-sonnet-4-20250514",
)
result = engine.run(scenario)
AI simulations produce richer outputs because each persona activation is a real LLM call with the persona's R.I.S.C.E.A.R. specification embedded in the system prompt. The quality and variety of outputs depend on the model, temperature, and prompt engineering.
Provider Configuration¶
The engine supports multiple providers:
| Provider | Models | Configuration |
|---|---|---|
anthropic |
Claude family | ANTHROPIC_API_KEY env var |
openai |
GPT family | OPENAI_API_KEY env var |
Provider-specific settings (model, temperature, max tokens per call) can be configured in fcc.yaml or passed as parameters to the engine.
Budget Management¶
AI simulations can be expensive. The budget system prevents runaway costs:
engine = SimulationEngine(
mode="ai",
provider="anthropic",
budget={
"max_tokens": 100000,
"max_cost_usd": 5.00,
},
)
The engine tracks cumulative token usage and estimated cost after each node execution. If either limit is exceeded, the simulation halts with a budget_exceeded status. The partial trace is still available for inspection.
Budget Strategy¶
A practical budgeting approach:
- Run mock first. Establish a baseline trace length and identify which nodes produce the most output.
- Estimate AI cost. Multiply the trace length by the average tokens per node, then apply your provider's pricing.
- Set the budget at 2x the estimate. This provides headroom for feedback loops without risking runaway costs.
- Monitor and adjust. After a few AI runs, refine the budget based on actual usage.
Understanding Traces¶
Every simulation produces a trace -- a chronological record of every node activation, output, and event. The trace is the primary debugging and analysis tool.
Trace Structure¶
trace = result.trace
# Top-level metadata
print(trace.simulation_id)
print(trace.scenario_id)
print(trace.started_at)
print(trace.completed_at)
print(trace.total_tokens)
# Steps (one per node activation)
for step in trace.steps:
print(f"Node: {step.node_id}")
print(f"Persona: {step.persona_id}")
print(f"Phase: {step.phase}")
print(f"Action: {step.action_type}")
print(f"Tokens: {step.tokens_used}")
print(f"Duration: {step.duration_ms}ms")
print(f"Output: {step.output[:200]}...")
print(f"Quality gates: {step.gate_results}")
print()
Reading a Trace¶
When debugging a simulation, focus on three things:
- Feedback cycles. Look for nodes that appear multiple times in the trace. Each reappearance means a feedback edge fired. Check the Critique node's output to understand what triggered the re-run.
- Quality gate results. Each step includes the results of any quality gates evaluated at that node. A failing gate explains why a feedback edge fired or why the simulation escalated.
- Token usage. If the simulation hit a budget limit, identify which nodes consumed the most tokens. Long outputs from verbose personas are the usual culprit.
Trace Persistence¶
Traces can be saved to JSON for later analysis:
from fcc.simulation.engine import SimulationEngine
engine = SimulationEngine(mode="mock")
result = engine.run(scenario)
# Save trace
result.trace.save("traces/my_simulation.json")
# Load trace
from fcc.simulation.trace import SimulationTrace
trace = SimulationTrace.load("traces/my_simulation.json")
Saved traces include all metadata, steps, outputs, and quality gate results. They are self-contained -- you do not need the original scenario or configuration to inspect a saved trace.
Trace Replay¶
The replay system re-executes a saved trace step by step, emitting events to the event bus at each step. This is useful for:
- Debugging: Step through a failing simulation to identify the root cause.
- Demonstration: Show stakeholders how a simulation progressed.
- Testing: Verify that event subscribers handle all event types correctly.
from fcc.collaboration.recording import SessionRecorder
recorder = SessionRecorder()
recorder.replay("traces/my_simulation.json")
During replay, the recorder emits the same events that the original simulation emitted, in the same order. Any event subscribers (loggers, dashboards, metrics collectors) process them as if the simulation were running live.
Comparing Traces¶
To evaluate different configurations, run the same scenario with different settings and compare the traces:
# Run with two different models
engine_a = SimulationEngine(mode="ai", provider="anthropic", model="claude-sonnet-4-20250514")
engine_b = SimulationEngine(mode="ai", provider="openai", model="gpt-4")
result_a = engine_a.run(scenario)
result_b = engine_b.run(scenario)
# Compare key metrics
print(f"Model A -- tokens: {result_a.trace.total_tokens}, "
f"gates passed: {result_a.trace.gates_passed}")
print(f"Model B -- tokens: {result_b.trace.total_tokens}, "
f"gates passed: {result_b.trace.gates_passed}")
Trace comparison is also the foundation for the evaluation framework covered in Book 3 -- automated comparison of trace quality across models, configurations, and persona variations.
Persona-Aware Prompts¶
The simulation engine generates persona-aware prompts for each node. The prompt structure follows a consistent pattern:
- System prompt: Persona identity (R.I.S.C.E.A.R. specification).
- Context: Outputs from upstream nodes.
- Task: The action type description and expected output format.
- Constraints: The persona's constraints plus any governance rules from the constitution.
This prompt structure is key to FCC's quality: by grounding every LLM call in the persona's specification, the engine ensures consistent, constrained, and auditable outputs.
Key Takeaways¶
- Mock mode is deterministic and free; AI mode is rich and costly. Start with mock.
- Budget management prevents runaway costs in AI mode. Set budgets at 2x estimated usage.
- Traces are the primary debugging tool. Focus on feedback cycles, gate results, and token usage.
- Traces can be saved, loaded, replayed, and compared.
- Persona-aware prompts ground every LLM call in the R.I.S.C.E.A.R. specification.
Cross-References¶
- Chapter 5: Plugin Development -- extend the simulation engine with custom plugins
- FCC Guidebook, Chapter 5 -- workflow and action engine reference
- Notebook 06: Simulation Deep Dive -- interactive simulation exploration
- Book 1, Chapter 3: Workflow Thinking -- conceptual foundation
← Chapter 3: Workflow Design | Next: Chapter 5 -- Plugin Development →