Skip to content

Model Selection Competition

Duration: 60 minutes Difficulty: Intermediate Pattern: Parallel Fan-out + Sequential Chain

This scenario demonstrates parallel model training with multiple specialist personas, followed by performance comparison and documented model selection.

Scenario Overview

Problem: A classification task requires selecting the best model architecture from several candidates. Each model type has a specialist persona who trains and optimizes it, followed by a comparative evaluation.

Goal: Execute parallel model training across four specialists, compare results, select the best model, and generate model card documentation.

Persona Team

Persona ID Role Category
Model Architect MAR Defines evaluation criteria and selects winner ml_models
Neural Network Specialist NNS Trains deep learning models ml_models
Gradient Boosted Trees GBT Trains gradient boosting models ml_models
Random Forest Specialist RFS Trains ensemble tree models ml_models

Setup

from fcc.personas.registry import PersonaRegistry
from fcc.simulation.engine import SimulationEngine
from fcc.simulation.messages import SimulationMessage
from fcc.messaging.bus import EventBus
from fcc.messaging.events import Event, EventType

registry = PersonaRegistry.from_yaml_directory("src/fcc/data/personas")
bus = EventBus()
engine = SimulationEngine(registry=registry, mode="deterministic")

competition = {
    "task": "Customer churn binary classification",
    "dataset": "telecom_churn_10k",
    "features": 45,
    "train_size": 8000,
    "test_size": 2000,
    "primary_metric": "AUC-ROC",
    "secondary_metrics": ["precision@10%", "recall", "F1"],
}

Phase 1: Define Evaluation Criteria

The Model Architect sets the competition rules:

mar_message = SimulationMessage(
    sender="orchestrator",
    receiver="MAR",
    content=(
        f"Define the model selection criteria for:\n"
        f"Task: {competition['task']}\n"
        f"Dataset: {competition['dataset']} "
        f"({competition['features']} features, "
        f"{competition['train_size']} train / {competition['test_size']} test)\n"
        f"Primary metric: {competition['primary_metric']}\n"
        f"Secondary metrics: {', '.join(competition['secondary_metrics'])}\n\n"
        "Define:\n"
        "- Evaluation protocol (cross-validation strategy)\n"
        "- Minimum performance thresholds\n"
        "- Tie-breaking criteria\n"
        "- Training time and inference latency constraints\n"
        "- Model interpretability requirements"
    ),
    phase="find",
)

eval_criteria = engine.step(mar_message)
print(f"Evaluation criteria: {len(eval_criteria.content)} chars")

Phase 2: Parallel Model Training

Each specialist trains their model type independently:

specialists = {
    "NNS": (
        "Train a neural network for this classification task:\n\n"
        f"Task: {competition['task']}\n"
        f"Features: {competition['features']}\n\n"
        "Try architectures: MLP (2-4 hidden layers), "
        "with dropout, batch normalization. "
        "Optimize: learning rate, batch size, architecture depth. "
        "Use early stopping on validation loss. "
        "Report: best architecture, hyperparameters, and all metrics."
    ),
    "GBT": (
        "Train a gradient boosted tree model for this task:\n\n"
        f"Task: {competition['task']}\n"
        f"Features: {competition['features']}\n\n"
        "Try: XGBoost, LightGBM, CatBoost. "
        "Optimize: n_estimators, max_depth, learning_rate, "
        "subsample, colsample_bytree. "
        "Use Bayesian hyperparameter optimization. "
        "Report: best framework, hyperparameters, and all metrics."
    ),
    "RFS": (
        "Train a random forest ensemble for this task:\n\n"
        f"Task: {competition['task']}\n"
        f"Features: {competition['features']}\n\n"
        "Try: sklearn RandomForest, ExtraTrees. "
        "Optimize: n_estimators, max_depth, min_samples_split, "
        "max_features, bootstrap. "
        "Evaluate feature importance rankings. "
        "Report: best configuration, hyperparameters, and all metrics."
    ),
}

model_results = {}
for persona_id, task_content in specialists.items():
    message = SimulationMessage(
        sender="MAR",
        receiver=persona_id,
        content=task_content,
        phase="create",
    )
    result = engine.step(message)
    model_results[persona_id] = result

    bus.publish(Event(
        event_type=EventType.SIMULATION_STEP_COMPLETED,
        source=f"model_competition.{persona_id}",
        payload={
            "persona": persona_id,
            "output_length": len(result.content),
        },
    ))
    print(f"  {persona_id} training complete: {len(result.content)} chars")

Phase 3: Performance Comparison

The Model Architect compares all results:

# Prepare comparison input
comparison_input = "Model training results:\n\n"
for persona_id, result in model_results.items():
    persona = registry.get(persona_id)
    comparison_input += f"## {persona.name} ({persona_id})\n"
    comparison_input += f"{result.content[:400]}\n\n"

comparison_message = SimulationMessage(
    sender="orchestrator",
    receiver="MAR",
    content=(
        f"Compare the following model results and select the winner:\n\n"
        f"{comparison_input}\n\n"
        f"Evaluation criteria:\n{eval_criteria.content[:400]}\n\n"
        "Produce:\n"
        "- Side-by-side metric comparison table\n"
        "- Winner selection with justification\n"
        "- Runner-up analysis\n"
        "- Trade-off discussion (accuracy vs latency vs interpretability)\n"
        "- Recommendations for production deployment"
    ),
    phase="critique",
)

comparison_result = engine.step(comparison_message)
print(f"Comparison report: {len(comparison_result.content)} chars")

Phase 4: Model Card Documentation

Generate comprehensive model card documentation:

from fcc.collaboration.scoring import ScoringEngine

scorer = ScoringEngine()

# Score each model's documentation
model_scores = {}
for persona_id, result in model_results.items():
    score = scorer.score_text(result.content)
    model_scores[persona_id] = score
    print(f"  {persona_id} documentation quality: {score:.2f}")

# Build the model card
import json

model_card = {
    "competition": competition,
    "candidates": [
        {
            "model_type": pid,
            "persona": registry.get(pid).name,
            "output_length": len(result.content),
            "documentation_score": model_scores[pid],
        }
        for pid, result in model_results.items()
    ],
    "evaluation_criteria_length": len(eval_criteria.content),
    "comparison_report_length": len(comparison_result.content),
    "comparison_score": scorer.score_text(comparison_result.content),
}

print("\nModel Card:")
print(json.dumps(model_card, indent=2))

Competition Summary

print("\n" + "=" * 50)
print("MODEL SELECTION COMPETITION SUMMARY")
print("=" * 50)
print(f"Task: {competition['task']}")
print(f"Primary metric: {competition['primary_metric']}")
print(f"Candidates: {len(specialists)}")
print()
for pid in specialists:
    persona = registry.get(pid)
    print(f"  {persona.name} ({pid}): "
          f"doc_quality={model_scores[pid]:.2f}")
print(f"\nComparison report quality: "
      f"{scorer.score_text(comparison_result.content):.2f}")

Exercises

  1. Add ensemble: After selection, create a stacking ensemble combining the top two models and compare with individual results.
  2. Governance gate: Add a GCA review step before the winner is approved for production.
  3. Time tracking: Record training time for each specialist and include it in the comparison criteria.
  4. Knowledge graph: Build a KG with model nodes, metric edges, and comparison relationships.

Summary

In this scenario you executed a model selection competition:

  • MAR defined evaluation criteria and competition rules
  • NNS, GBT, and RFS trained models in parallel (fan-out)
  • MAR compared all results with a structured evaluation (fan-in)
  • Model card documentation was generated for the competition

Next Steps