Skip to content

FCC Total Cost of Ownership (TCO)

A worked cost model for enterprise FCC deployments. Use it to budget, compare against commercial alternatives, and build an ROI case for your CFO. For security posture see the companion security review.

Cost components

FCC's TCO breaks into six components:

Component Typical share Who owns it
Licensing 0% MIT license - no cost
Compute (hosting) 10-40% Infra / platform team
AI provider tokens 40-80% Platform + FinOps
Observability & storage 5-15% SRE
Human support & admin 5-20% Engineering management
Training & onboarding 2-10% Learning & Dev

Exact mix depends on whether you use hosted providers (high token spend) or local models via Ollama/vLLM (high compute, near-zero tokens).

Scaling factors

Four knobs drive TCO:

  1. Number of personas in active use
  2. Registry load is in-memory and negligible up to ~500 personas.
  3. Simulations per day
  4. Linear driver of token cost.
  5. Concurrent users
  6. Drives the sizing of the backend container / K8s replicas.
  7. Average tokens per call
  8. Workflow size (5-node vs 55-node) multiplies tokens accordingly.

The formula:

daily_token_cost = simulations_per_day
                 * avg_steps_per_sim
                 * avg_tokens_per_step
                 * provider_usd_per_1k_tokens
                 / 1000

Provider price sheet (as of v1.3.3)

Provider Input $/1M Output $/1M Notes
Anthropic Claude Opus 4.6 $15 $75 Hosted, high quality
OpenAI GPT-4 class $10 $30 Hosted
Azure OpenAI $10 $30 Hosted, private networking extras
Ollama (local) $0 $0 Infra-only cost
vLLM (local) $0 $0 Infra-only cost (GPU)
LiteLLM (router) Pass-through Pass-through Adds ops surface
Mock $0 $0 Dev/test/CI only

Always benchmark your workload; these are list prices.

Worked scenario A: Small team (10 researchers)

Profile

  • 10 users, 20 simulations/day each
  • Average 15 steps/simulation, 2,000 tokens in + 500 tokens out per step
  • Provider mix: 80% mock (CI + iteration), 20% Anthropic Claude Opus 4.6

Monthly cost

Line Calculation $/month
Compute 1x backend container on existing dev server $0
Tokens 10 * 20 * 15 * 0.2 * 22 workdays * ((2000/1M)$15 + (500/1M)$75) ~$91
Storage Event logs 5 GB/mo ~$1
Observability Console/JSON exporter $0
Total ~$92/mo

ROI: at a conservative $20/hour loaded labor rate, FCC pays back if it saves ~5 hours/month across the team.

Worked scenario B: Mid-size org (100 users)

Profile

  • 100 users, 50 simulations/day each
  • 20 steps/sim, 3,000 in + 800 out per step
  • Provider mix: 50% Anthropic, 30% OpenAI, 20% mock
  • Deployment: Kubernetes, 3 replicas backend + 2 replicas streamlit

Monthly cost

Line Calculation $/month
Compute (K8s, AWS) 5 pods @ 1 vCPU / 2 GB + LB ~$380
Token spend (Anthropic, 50%) 100502022 * 0.5 * ((3000/1M)$15+(800/1M)*$75) ~$5,280
Token spend (OpenAI, 30%) 100502022 * 0.3 * ((3000/1M)$10+(800/1M)*$30) ~$1,075
Storage (S3, 200 GB) ~$6
Observability (managed OTel) ~$120
Total ~$6,860/mo

Cost optimization suggestions:

  • Move 50% of exploration-phase traffic to Ollama (70B model on a shared GPU node) - projected savings ~$2,600/mo.
  • Enable scenario caching - deduplicates identical runs; typical hit rate 20-30%.

Worked scenario C: Enterprise (1,000+ users)

Profile

  • 1,500 users, 30 simulations/day each
  • 25 steps/sim, 4,000 in + 1,200 out per step
  • Provider mix: 20% Claude Opus, 10% OpenAI, 70% local vLLM (8x H100)
  • Deployment: Kubernetes HA, 10 backend replicas, vLLM cluster, multi-region

Monthly cost

Line Calculation $/month
Compute - FCC pods 20 pods x 2 vCPU / 4 GB ~$1,900
Compute - vLLM (8x H100, AWS) 730 hrs x $32/hr ~$23,400
Token spend (Claude, 20%) 1500302522 * 0.2 * ((4000/1M)$15+(1200/1M)*$75) ~$37,100
Token spend (OpenAI, 10%) 1500302522 * 0.1 * ((4000/1M)$10+(1200/1M)*$30) ~$7,920
Storage (S3 + long-term) 10 TB ~$230
Observability (managed) ~$1,200
Support staff (2 FTE platform) Amortized loaded cost ~$35,000
Total ~$106,750/mo

Annualized: ~$1.28M for 1,500 daily-active users = ~$71/user/month.

Cost optimization suggestions:

  • Shift to on-prem vLLM: swaps opex for capex; typical 12-18 month payback.
  • Aggressive scenario caching + embedding cache: 10-20% token reduction.
  • Tier workloads: use cheaper models for FIND phase, premium for CRITIQUE.

Cost optimization strategies

Strategy Typical savings Effort
Mock mode in CI 100% of CI token cost Low (default)
Ollama/vLLM for dev + exploration 60-90% tokens Medium
Scenario deduplication cache 15-30% Medium
Embedding + retrieval cache 10-20% Medium
Workflow pruning (smaller graph when tolerable) 20-50% Low
Temperature=0 + seed-based reuse 5-15% Low
LiteLLM routing to cheapest-capable model 10-30% Medium

Comparison vs commercial agent frameworks

Framework Annual license TCO for scenario B Notes
FCC (open source MIT) $0 ~$82k/yr Self-hosted, full control
Commercial Framework X $80-200k ~$160-300k/yr Platform fee + per-seat
Commercial Framework Y $120-400k ~$220-500k/yr Usage-based on top of platform fee

FCC's "cost" is the engineering hours to run it. For organizations already running modern Python stacks on Kubernetes, that marginal cost is small.

ROI template

Estimate ROI over one year:

value_delivered = hours_saved_per_user_month
                * 12
                * avg_loaded_hourly_rate
                * num_users
                * adoption_rate

total_cost = annual_fcc_tco

roi = (value_delivered - total_cost) / total_cost

Example (Scenario B, 100 users, 3 hrs saved/user/mo, $80/hr loaded, 70% adoption):

value_delivered = 3 * 12 * 80 * 100 * 0.7 = $201,600
total_cost      = 6,860 * 12          = $82,320
roi             = (201600 - 82320) / 82320 = 145%

A sub-1-year payback is realistic when FCC replaces ad-hoc prompt engineering with structured persona teams.

FinOps checklist

  • Per-team tags on Kubernetes namespaces for chargeback
  • Scenario-level token accounting via the event bus (LLM_CALL events)
  • Monthly cost report pipeline (events -> data warehouse -> dashboard)
  • Budget alerts per provider
  • Model-tier policy (which personas may use premium models)
  • Scheduled caching audit (hit rate, stale entries)
  • Annual TCO review + provider mix re-evaluation
  • Security review -- risk posture that complements this cost model
  • Enterprise deployment -- deployment topologies referenced above
  • Quickstart: enterprise -- zero-to-running checklist
  • AI compliance -- regulatory costs (EU AI Act, NIST RMF)
  • charts/fcc/ -- Helm chart used in scenarios B and C
  • docker/Dockerfile.backend -- base image used for pod sizing above