Model Selection Competition¶
Duration: 60 minutes Difficulty: Intermediate Pattern: Parallel Fan-out + Sequential Chain
This scenario demonstrates parallel model training with multiple specialist personas, followed by performance comparison and documented model selection.
Scenario Overview¶
Problem: A classification task requires selecting the best model architecture from several candidates. Each model type has a specialist persona who trains and optimizes it, followed by a comparative evaluation.
Goal: Execute parallel model training across four specialists, compare results, select the best model, and generate model card documentation.
Persona Team¶
| Persona | ID | Role | Category |
|---|---|---|---|
| Model Architect | MAR | Defines evaluation criteria and selects winner | ml_models |
| Neural Network Specialist | NNS | Trains deep learning models | ml_models |
| Gradient Boosted Trees | GBT | Trains gradient boosting models | ml_models |
| Random Forest Specialist | RFS | Trains ensemble tree models | ml_models |
Setup¶
from fcc.personas.registry import PersonaRegistry
from fcc.simulation.engine import SimulationEngine
from fcc.simulation.messages import SimulationMessage
from fcc.messaging.bus import EventBus
from fcc.messaging.events import Event, EventType
registry = PersonaRegistry.from_yaml_directory("src/fcc/data/personas")
bus = EventBus()
engine = SimulationEngine(registry=registry, mode="deterministic")
competition = {
"task": "Customer churn binary classification",
"dataset": "telecom_churn_10k",
"features": 45,
"train_size": 8000,
"test_size": 2000,
"primary_metric": "AUC-ROC",
"secondary_metrics": ["precision@10%", "recall", "F1"],
}
Phase 1: Define Evaluation Criteria¶
The Model Architect sets the competition rules:
mar_message = SimulationMessage(
sender="orchestrator",
receiver="MAR",
content=(
f"Define the model selection criteria for:\n"
f"Task: {competition['task']}\n"
f"Dataset: {competition['dataset']} "
f"({competition['features']} features, "
f"{competition['train_size']} train / {competition['test_size']} test)\n"
f"Primary metric: {competition['primary_metric']}\n"
f"Secondary metrics: {', '.join(competition['secondary_metrics'])}\n\n"
"Define:\n"
"- Evaluation protocol (cross-validation strategy)\n"
"- Minimum performance thresholds\n"
"- Tie-breaking criteria\n"
"- Training time and inference latency constraints\n"
"- Model interpretability requirements"
),
phase="find",
)
eval_criteria = engine.step(mar_message)
print(f"Evaluation criteria: {len(eval_criteria.content)} chars")
Phase 2: Parallel Model Training¶
Each specialist trains their model type independently:
specialists = {
"NNS": (
"Train a neural network for this classification task:\n\n"
f"Task: {competition['task']}\n"
f"Features: {competition['features']}\n\n"
"Try architectures: MLP (2-4 hidden layers), "
"with dropout, batch normalization. "
"Optimize: learning rate, batch size, architecture depth. "
"Use early stopping on validation loss. "
"Report: best architecture, hyperparameters, and all metrics."
),
"GBT": (
"Train a gradient boosted tree model for this task:\n\n"
f"Task: {competition['task']}\n"
f"Features: {competition['features']}\n\n"
"Try: XGBoost, LightGBM, CatBoost. "
"Optimize: n_estimators, max_depth, learning_rate, "
"subsample, colsample_bytree. "
"Use Bayesian hyperparameter optimization. "
"Report: best framework, hyperparameters, and all metrics."
),
"RFS": (
"Train a random forest ensemble for this task:\n\n"
f"Task: {competition['task']}\n"
f"Features: {competition['features']}\n\n"
"Try: sklearn RandomForest, ExtraTrees. "
"Optimize: n_estimators, max_depth, min_samples_split, "
"max_features, bootstrap. "
"Evaluate feature importance rankings. "
"Report: best configuration, hyperparameters, and all metrics."
),
}
model_results = {}
for persona_id, task_content in specialists.items():
message = SimulationMessage(
sender="MAR",
receiver=persona_id,
content=task_content,
phase="create",
)
result = engine.step(message)
model_results[persona_id] = result
bus.publish(Event(
event_type=EventType.SIMULATION_STEP_COMPLETED,
source=f"model_competition.{persona_id}",
payload={
"persona": persona_id,
"output_length": len(result.content),
},
))
print(f" {persona_id} training complete: {len(result.content)} chars")
Phase 3: Performance Comparison¶
The Model Architect compares all results:
# Prepare comparison input
comparison_input = "Model training results:\n\n"
for persona_id, result in model_results.items():
persona = registry.get(persona_id)
comparison_input += f"## {persona.name} ({persona_id})\n"
comparison_input += f"{result.content[:400]}\n\n"
comparison_message = SimulationMessage(
sender="orchestrator",
receiver="MAR",
content=(
f"Compare the following model results and select the winner:\n\n"
f"{comparison_input}\n\n"
f"Evaluation criteria:\n{eval_criteria.content[:400]}\n\n"
"Produce:\n"
"- Side-by-side metric comparison table\n"
"- Winner selection with justification\n"
"- Runner-up analysis\n"
"- Trade-off discussion (accuracy vs latency vs interpretability)\n"
"- Recommendations for production deployment"
),
phase="critique",
)
comparison_result = engine.step(comparison_message)
print(f"Comparison report: {len(comparison_result.content)} chars")
Phase 4: Model Card Documentation¶
Generate comprehensive model card documentation:
from fcc.collaboration.scoring import ScoringEngine
scorer = ScoringEngine()
# Score each model's documentation
model_scores = {}
for persona_id, result in model_results.items():
score = scorer.score_text(result.content)
model_scores[persona_id] = score
print(f" {persona_id} documentation quality: {score:.2f}")
# Build the model card
import json
model_card = {
"competition": competition,
"candidates": [
{
"model_type": pid,
"persona": registry.get(pid).name,
"output_length": len(result.content),
"documentation_score": model_scores[pid],
}
for pid, result in model_results.items()
],
"evaluation_criteria_length": len(eval_criteria.content),
"comparison_report_length": len(comparison_result.content),
"comparison_score": scorer.score_text(comparison_result.content),
}
print("\nModel Card:")
print(json.dumps(model_card, indent=2))
Competition Summary¶
print("\n" + "=" * 50)
print("MODEL SELECTION COMPETITION SUMMARY")
print("=" * 50)
print(f"Task: {competition['task']}")
print(f"Primary metric: {competition['primary_metric']}")
print(f"Candidates: {len(specialists)}")
print()
for pid in specialists:
persona = registry.get(pid)
print(f" {persona.name} ({pid}): "
f"doc_quality={model_scores[pid]:.2f}")
print(f"\nComparison report quality: "
f"{scorer.score_text(comparison_result.content):.2f}")
Exercises¶
- Add ensemble: After selection, create a stacking ensemble combining the top two models and compare with individual results.
- Governance gate: Add a GCA review step before the winner is approved for production.
- Time tracking: Record training time for each specialist and include it in the comparison criteria.
- Knowledge graph: Build a KG with model nodes, metric edges, and comparison relationships.
Summary¶
In this scenario you executed a model selection competition:
- MAR defined evaluation criteria and competition rules
- NNS, GBT, and RFS trained models in parallel (fan-out)
- MAR compared all results with a structured evaluation (fan-in)
- Model card documentation was generated for the competition
Next Steps¶
- ML Pipeline Handoff -- Deploy the selected model
- DevOps Deployment Chain -- CI/CD for model deployment