Research Methodology Guide¶
This guide describes how to use the FCC framework as a research instrument for studying multi-agent workflows, AI governance, team collaboration, and knowledge management. It covers experiment design, data collection, analysis, reproducibility, and ethical considerations.
FCC as a Research Instrument¶
The FCC framework provides several properties that make it suitable for controlled experiments:
- Deterministic mock mode -- Simulations produce identical outputs for identical inputs, enabling reproducible experiments without API costs
- Configurable personas -- Researchers can systematically vary persona attributes (dimensions, constraints, archetypes) as independent variables
- Structured traces -- Every simulation produces a trace with timestamped steps, persona assignments, and phase labels
- Scoring engine -- Deliverable quality can be evaluated numerically (1-5 scale) with configurable criteria
- Event bus -- All system events are captured and replayable for post-hoc analysis
- Knowledge graphs -- Relationships between entities are explicit and queryable
Experiment Design with Persona Configurations¶
Independent Variables¶
FCC experiments typically manipulate one or more of the following:
| Variable | How to Manipulate | Example Hypothesis |
|---|---|---|
| Number of personas | Add/remove personas from the registry | More personas improve output diversity |
| R.I.S.C.E.A.R. constraints | Modify the constraints field | Stricter constraints reduce errors |
| Governance tier | Change from preferred to hard-stop | Hard-stop rules reduce quality variance |
| Workflow graph size | Use 5-node vs. 24-node graph | Longer workflows improve completeness |
| Persona dimensions | Vary dimension attribute values | Higher curiosity scores correlate with broader output |
| Cross-reference density | Add/remove cross-reference entries | Denser collaboration networks improve coherence |
Dependent Variables¶
| Variable | How to Measure | FCC Source |
|---|---|---|
| Output quality | ScoringEngine evaluation (1-5 scale) | fcc.collaboration.scoring |
| Task completion rate | Percentage of workflow nodes completed | Simulation trace |
| Collaboration turn count | Number of turns in collaboration session | Session recording |
| Gate pass rate | Percentage of approval gates passed | Collaboration engine |
| Event diversity | Number of distinct event types emitted | Event bus log |
| Processing time | Duration from first to last trace step | Trace timestamps |
Control Variables¶
To isolate the effect of your independent variable, hold constant:
- Python version and FCC version
- Workflow graph (unless graph size is the independent variable)
- Mock mode vs. AI mode
- Random seeds (mock mode is deterministic by default)
- Scoring criteria and thresholds
Experimental Design Template¶
Title: [Effect of X on Y in FCC-mediated workflows]
Hypothesis: [H1: ...]
Independent Variable: [e.g., number of governance hard-stop rules]
Levels: [e.g., 0, 2, 5, 10]
Dependent Variable: [e.g., quality gate pass rate]
Control Variables:
- Workflow: extended_sequence (20 nodes)
- Personas: core_personas.yaml (unmodified)
- Mode: mock
- Scoring threshold: 3.5
Procedure:
1. For each level of the independent variable:
a. Configure governance with N hard-stop rules
b. Run 10 simulation iterations
c. Record quality scores and gate outcomes
2. Aggregate results and perform statistical analysis
Expected Results: [...]
Data Collection via Simulation Traces¶
Trace Structure¶
Each simulation run produces a trace containing:
{
"trace_id": "uuid-...",
"workflow_graph": "extended_sequence",
"steps": [
{
"step_index": 0,
"node_id": "n1",
"phase": "Find",
"persona_id": "RC",
"response": "...",
"timestamp": "2026-03-30T10:00:00Z"
},
...
]
}
Collecting Data¶
from fcc.simulation.engine import SimulationEngine
from fcc.messaging.bus import EventBus
bus = EventBus()
collected_events = []
def collector(event):
collected_events.append(event.to_dict())
bus.subscribe(collector)
engine = SimulationEngine(mock=True, event_bus=bus)
trace = engine.run(workflow_graph, persona_registry)
# Save trace for analysis
import json
with open("experiment_trace.json", "w") as f:
json.dump(trace.to_dict(), f, indent=2)
# Save events
with open("experiment_events.json", "w") as f:
json.dump(collected_events, f, indent=2)
Multi-Run Data Collection¶
results = []
for run_id in range(10):
trace = engine.run(workflow_graph, persona_registry)
results.append({
"run_id": run_id,
"steps": len(trace.steps),
"phases": [s.phase for s in trace.steps],
})
Analysis via Scoring Engine¶
Quality Scoring¶
from fcc.collaboration.scoring import ScoringEngine
scorer = ScoringEngine()
# Score a deliverable
score = scorer.evaluate(
deliverable="Generated API documentation...",
criteria=["completeness", "accuracy", "clarity"],
)
print(f"Quality score: {score.overall} / 5.0")
Statistical Analysis¶
With collected data, apply standard statistical methods:
import statistics
scores = [run["quality_score"] for run in results]
print(f"Mean: {statistics.mean(scores):.2f}")
print(f"Std Dev: {statistics.stdev(scores):.2f}")
print(f"Median: {statistics.median(scores):.2f}")
For comparing experimental conditions, use: - t-test (2 conditions): Compare means between two persona configurations - ANOVA (3+ conditions): Compare means across multiple governance levels - Chi-squared (categorical): Compare gate pass/fail rates across conditions - Correlation (continuous): Relate dimension scores to quality outcomes
Publication Preparation Workflow¶
Recommended Structure¶
- Introduction: Motivation for studying multi-agent workflows with FCC
- Related Work: Position relative to existing multi-agent frameworks (AutoGen, CrewAI, MetaGPT)
- Methodology: FCC configuration, experimental design, data collection procedure
- Results: Statistical analysis of simulation traces and quality scores
- Discussion: Interpretation, limitations, threats to validity
- Reproducibility Package: Link to FCC version, configuration files, and analysis scripts
Reproducibility Package Contents¶
reproducibility/
README.md
requirements.txt # Pinned FCC version
config/
personas.yaml # Exact persona configurations used
workflow.json # Workflow graph used
governance.yaml # Governance rules used
scripts/
run_experiment.py # Data collection script
analyze_results.py # Statistical analysis
data/
raw_traces/ # Simulation traces (JSON)
processed/ # Aggregated results (CSV)
Citation¶
@software{fcc_framework,
title = {FCC Agent Team Framework},
author = {Information Collective, LLC},
year = {2026},
url = {https://github.com/rollingthunderfourtytwo-afk/l2_fcc_agent_team_ext},
version = {0.8.0}
}
IRB Considerations for AI-Mediated Research¶
When IRB Review Is Needed¶
If your research involves: - Human participants interacting with FCC agents through the collaboration engine, IRB review is typically required - Purely computational experiments (mock simulations without human participants) generally do not require IRB review - Surveys or interviews about FCC usage require standard human subjects protocols
Key Considerations¶
| Concern | Mitigation |
|---|---|
| Informed consent | Participants must know they are interacting with AI agents |
| Data privacy | Session recordings may contain personally identifiable information; anonymize before publication |
| Deception | If participants are unaware that "agents" are AI-generated, this constitutes deception |
| Power dynamics | In classroom settings, student participation should be voluntary and not affect grades |
| Data retention | Define retention policies for collaboration session recordings |
Recommended Protocol¶
- Obtain IRB approval before collecting data from human participants
- Provide informed consent forms that explain: the role of AI agents, data collection scope, how data will be stored and shared
- Allow participants to withdraw at any time and have their data deleted
- Anonymize session recordings before analysis: replace names with participant codes
- Store data on encrypted, access-controlled systems
Reproducibility Guidelines¶
Pinning Versions¶
Always record exact versions:
Configuration Archival¶
Save the exact YAML/JSON configuration used for each experiment:
import shutil
shutil.copy("src/fcc/data/personas/core_personas.yaml", "experiment/config/")
shutil.copy("src/fcc/data/workflows/extended_sequence.json", "experiment/config/")
Mock Mode for Reproducibility¶
Mock mode produces identical results across runs, making it ideal for reproducible experiments. When using AI mode, record the model name, temperature, and any other parameters:
config = {
"mode": "mock", # or "ai"
"model": "claude-sonnet-4-20250514",
"temperature": 0.7,
"max_tokens": 4096,
"fcc_version": fcc.__version__,
"python_version": sys.version,
}
Sharing Data¶
Share raw traces and analysis scripts alongside publications. Use stable identifiers (DOI, Zenodo) for long-term accessibility.
See Also¶
- Educators Guide -- Course syllabus for teaching with FCC
- Student Workbook -- Weekly exercises
- Case Study Template -- Template and examples
- Citation -- How to cite FCC