Skip to content

Research Methodology Guide

This guide describes how to use the FCC framework as a research instrument for studying multi-agent workflows, AI governance, team collaboration, and knowledge management. It covers experiment design, data collection, analysis, reproducibility, and ethical considerations.


FCC as a Research Instrument

The FCC framework provides several properties that make it suitable for controlled experiments:

  1. Deterministic mock mode -- Simulations produce identical outputs for identical inputs, enabling reproducible experiments without API costs
  2. Configurable personas -- Researchers can systematically vary persona attributes (dimensions, constraints, archetypes) as independent variables
  3. Structured traces -- Every simulation produces a trace with timestamped steps, persona assignments, and phase labels
  4. Scoring engine -- Deliverable quality can be evaluated numerically (1-5 scale) with configurable criteria
  5. Event bus -- All system events are captured and replayable for post-hoc analysis
  6. Knowledge graphs -- Relationships between entities are explicit and queryable

Experiment Design with Persona Configurations

Independent Variables

FCC experiments typically manipulate one or more of the following:

Variable How to Manipulate Example Hypothesis
Number of personas Add/remove personas from the registry More personas improve output diversity
R.I.S.C.E.A.R. constraints Modify the constraints field Stricter constraints reduce errors
Governance tier Change from preferred to hard-stop Hard-stop rules reduce quality variance
Workflow graph size Use 5-node vs. 24-node graph Longer workflows improve completeness
Persona dimensions Vary dimension attribute values Higher curiosity scores correlate with broader output
Cross-reference density Add/remove cross-reference entries Denser collaboration networks improve coherence

Dependent Variables

Variable How to Measure FCC Source
Output quality ScoringEngine evaluation (1-5 scale) fcc.collaboration.scoring
Task completion rate Percentage of workflow nodes completed Simulation trace
Collaboration turn count Number of turns in collaboration session Session recording
Gate pass rate Percentage of approval gates passed Collaboration engine
Event diversity Number of distinct event types emitted Event bus log
Processing time Duration from first to last trace step Trace timestamps

Control Variables

To isolate the effect of your independent variable, hold constant:

  • Python version and FCC version
  • Workflow graph (unless graph size is the independent variable)
  • Mock mode vs. AI mode
  • Random seeds (mock mode is deterministic by default)
  • Scoring criteria and thresholds

Experimental Design Template

Title: [Effect of X on Y in FCC-mediated workflows]

Hypothesis: [H1: ...]

Independent Variable: [e.g., number of governance hard-stop rules]
Levels: [e.g., 0, 2, 5, 10]

Dependent Variable: [e.g., quality gate pass rate]

Control Variables:
  - Workflow: extended_sequence (20 nodes)
  - Personas: core_personas.yaml (unmodified)
  - Mode: mock
  - Scoring threshold: 3.5

Procedure:
  1. For each level of the independent variable:
     a. Configure governance with N hard-stop rules
     b. Run 10 simulation iterations
     c. Record quality scores and gate outcomes
  2. Aggregate results and perform statistical analysis

Expected Results: [...]

Data Collection via Simulation Traces

Trace Structure

Each simulation run produces a trace containing:

{
    "trace_id": "uuid-...",
    "workflow_graph": "extended_sequence",
    "steps": [
        {
            "step_index": 0,
            "node_id": "n1",
            "phase": "Find",
            "persona_id": "RC",
            "response": "...",
            "timestamp": "2026-03-30T10:00:00Z"
        },
        ...
    ]
}

Collecting Data

from fcc.simulation.engine import SimulationEngine
from fcc.messaging.bus import EventBus

bus = EventBus()
collected_events = []

def collector(event):
    collected_events.append(event.to_dict())

bus.subscribe(collector)

engine = SimulationEngine(mock=True, event_bus=bus)
trace = engine.run(workflow_graph, persona_registry)

# Save trace for analysis
import json
with open("experiment_trace.json", "w") as f:
    json.dump(trace.to_dict(), f, indent=2)

# Save events
with open("experiment_events.json", "w") as f:
    json.dump(collected_events, f, indent=2)

Multi-Run Data Collection

results = []
for run_id in range(10):
    trace = engine.run(workflow_graph, persona_registry)
    results.append({
        "run_id": run_id,
        "steps": len(trace.steps),
        "phases": [s.phase for s in trace.steps],
    })

Analysis via Scoring Engine

Quality Scoring

from fcc.collaboration.scoring import ScoringEngine

scorer = ScoringEngine()

# Score a deliverable
score = scorer.evaluate(
    deliverable="Generated API documentation...",
    criteria=["completeness", "accuracy", "clarity"],
)
print(f"Quality score: {score.overall} / 5.0")

Statistical Analysis

With collected data, apply standard statistical methods:

import statistics

scores = [run["quality_score"] for run in results]
print(f"Mean: {statistics.mean(scores):.2f}")
print(f"Std Dev: {statistics.stdev(scores):.2f}")
print(f"Median: {statistics.median(scores):.2f}")

For comparing experimental conditions, use: - t-test (2 conditions): Compare means between two persona configurations - ANOVA (3+ conditions): Compare means across multiple governance levels - Chi-squared (categorical): Compare gate pass/fail rates across conditions - Correlation (continuous): Relate dimension scores to quality outcomes


Publication Preparation Workflow

  1. Introduction: Motivation for studying multi-agent workflows with FCC
  2. Related Work: Position relative to existing multi-agent frameworks (AutoGen, CrewAI, MetaGPT)
  3. Methodology: FCC configuration, experimental design, data collection procedure
  4. Results: Statistical analysis of simulation traces and quality scores
  5. Discussion: Interpretation, limitations, threats to validity
  6. Reproducibility Package: Link to FCC version, configuration files, and analysis scripts

Reproducibility Package Contents

reproducibility/
  README.md
  requirements.txt          # Pinned FCC version
  config/
    personas.yaml           # Exact persona configurations used
    workflow.json            # Workflow graph used
    governance.yaml          # Governance rules used
  scripts/
    run_experiment.py        # Data collection script
    analyze_results.py       # Statistical analysis
  data/
    raw_traces/              # Simulation traces (JSON)
    processed/               # Aggregated results (CSV)

Citation

@software{fcc_framework,
  title = {FCC Agent Team Framework},
  author = {Information Collective, LLC},
  year = {2026},
  url = {https://github.com/rollingthunderfourtytwo-afk/l2_fcc_agent_team_ext},
  version = {0.8.0}
}

IRB Considerations for AI-Mediated Research

When IRB Review Is Needed

If your research involves: - Human participants interacting with FCC agents through the collaboration engine, IRB review is typically required - Purely computational experiments (mock simulations without human participants) generally do not require IRB review - Surveys or interviews about FCC usage require standard human subjects protocols

Key Considerations

Concern Mitigation
Informed consent Participants must know they are interacting with AI agents
Data privacy Session recordings may contain personally identifiable information; anonymize before publication
Deception If participants are unaware that "agents" are AI-generated, this constitutes deception
Power dynamics In classroom settings, student participation should be voluntary and not affect grades
Data retention Define retention policies for collaboration session recordings
  1. Obtain IRB approval before collecting data from human participants
  2. Provide informed consent forms that explain: the role of AI agents, data collection scope, how data will be stored and shared
  3. Allow participants to withdraw at any time and have their data deleted
  4. Anonymize session recordings before analysis: replace names with participant codes
  5. Store data on encrypted, access-controlled systems

Reproducibility Guidelines

Pinning Versions

Always record exact versions:

import fcc
print(f"FCC version: {fcc.__version__}")
print(f"Python version: {sys.version}")

Configuration Archival

Save the exact YAML/JSON configuration used for each experiment:

import shutil
shutil.copy("src/fcc/data/personas/core_personas.yaml", "experiment/config/")
shutil.copy("src/fcc/data/workflows/extended_sequence.json", "experiment/config/")

Mock Mode for Reproducibility

Mock mode produces identical results across runs, making it ideal for reproducible experiments. When using AI mode, record the model name, temperature, and any other parameters:

config = {
    "mode": "mock",  # or "ai"
    "model": "claude-sonnet-4-20250514",
    "temperature": 0.7,
    "max_tokens": 4096,
    "fcc_version": fcc.__version__,
    "python_version": sys.version,
}

Sharing Data

Share raw traces and analysis scripts alongside publications. Use stable identifiers (DOI, Zenodo) for long-term accessibility.


See Also