Skip to content

Evaluation Overview Demo

Overview

This demo provides a comprehensive overview of the FCC evaluation ecosystem, including CLEAR+ benchmarking, model cards, compliance auditing, and the evaluation dashboard.

What You'll Learn

  • The 7 CLEAR+ evaluation dimensions (Cost, Latency, Efficacy, Assurance, Reliability, Coverage, Explainability)
  • How to run benchmarks and generate model cards
  • How the evaluation pipeline integrates with compliance

Prerequisites

  • FCC framework installed (pip install -e .)
  • Basic understanding of personas and workflows

The FCC Evaluation Ecosystem

flowchart TD
    subgraph Evaluation["Evaluation Framework"]
        CLEAR["CLEAR+ Metrics\n7 dimensions"]
        Bench["BenchmarkRunner\nMock + AI modes"]
        Cards["ModelCardGenerator\n128 cards"]
    end

    subgraph Compliance["Compliance Framework"]
        EUAI["EU AI Act\n256+ requirements"]
        NIST["NIST AI RMF\n29 subcategories"]
        Audit["ComplianceAuditor"]
    end

    CLEAR --> Bench
    Bench --> Cards
    Cards --> Audit
    EUAI --> Audit
    NIST --> Audit

    style Evaluation fill:#E3F2FD,stroke:#1565C0
    style Compliance fill:#FFF3E0,stroke:#E65100

Quick Start

Run a Benchmark

fcc benchmark --suite baseline --format markdown

Generate Model Cards

fcc model-card --personas all --output docs/model-cards/

Run a Full Evaluation

fcc compliance-audit --personas all --include-benchmarks --format markdown

CLEAR+ Dimensions

Dimension Description Metric
Cost Resource consumption per evaluation Tokens, API calls
Latency Response time under load p50, p95, p99
Efficacy Quality of persona outputs Score 0-5
Assurance Compliance and safety guarantees Pass rate %
Reliability Consistency across runs Std deviation
Coverage Breadth of scenario coverage % of scenarios
Explainability Transparency of decisions Trace completeness