FCC Total Cost of Ownership (TCO)¶
A worked cost model for enterprise FCC deployments. Use it to budget, compare against commercial alternatives, and build an ROI case for your CFO. For security posture see the companion security review.
Cost components¶
FCC's TCO breaks into six components:
| Component | Typical share | Who owns it |
|---|---|---|
| Licensing | 0% | MIT license - no cost |
| Compute (hosting) | 10-40% | Infra / platform team |
| AI provider tokens | 40-80% | Platform + FinOps |
| Observability & storage | 5-15% | SRE |
| Human support & admin | 5-20% | Engineering management |
| Training & onboarding | 2-10% | Learning & Dev |
Exact mix depends on whether you use hosted providers (high token spend) or local models via Ollama/vLLM (high compute, near-zero tokens).
Scaling factors¶
Four knobs drive TCO:
- Number of personas in active use
- Registry load is in-memory and negligible up to ~500 personas.
- Simulations per day
- Linear driver of token cost.
- Concurrent users
- Drives the sizing of the backend container / K8s replicas.
- Average tokens per call
- Workflow size (5-node vs 55-node) multiplies tokens accordingly.
The formula:
daily_token_cost = simulations_per_day
* avg_steps_per_sim
* avg_tokens_per_step
* provider_usd_per_1k_tokens
/ 1000
Provider price sheet (as of v1.3.3)¶
| Provider | Input $/1M | Output $/1M | Notes |
|---|---|---|---|
| Anthropic Claude Opus 4.6 | $15 | $75 | Hosted, high quality |
| OpenAI GPT-4 class | $10 | $30 | Hosted |
| Azure OpenAI | $10 | $30 | Hosted, private networking extras |
| Ollama (local) | $0 | $0 | Infra-only cost |
| vLLM (local) | $0 | $0 | Infra-only cost (GPU) |
| LiteLLM (router) | Pass-through | Pass-through | Adds ops surface |
| Mock | $0 | $0 | Dev/test/CI only |
Always benchmark your workload; these are list prices.
Worked scenario A: Small team (10 researchers)¶
Profile
- 10 users, 20 simulations/day each
- Average 15 steps/simulation, 2,000 tokens in + 500 tokens out per step
- Provider mix: 80% mock (CI + iteration), 20% Anthropic Claude Opus 4.6
Monthly cost
| Line | Calculation | $/month |
|---|---|---|
| Compute | 1x backend container on existing dev server | $0 |
| Tokens | 10 * 20 * 15 * 0.2 * 22 workdays * ((2000/1M)$15 + (500/1M)$75) | ~$91 |
| Storage | Event logs 5 GB/mo | ~$1 |
| Observability | Console/JSON exporter | $0 |
| Total | ~$92/mo |
ROI: at a conservative $20/hour loaded labor rate, FCC pays back if it saves ~5 hours/month across the team.
Worked scenario B: Mid-size org (100 users)¶
Profile
- 100 users, 50 simulations/day each
- 20 steps/sim, 3,000 in + 800 out per step
- Provider mix: 50% Anthropic, 30% OpenAI, 20% mock
- Deployment: Kubernetes, 3 replicas backend + 2 replicas streamlit
Monthly cost
| Line | Calculation | $/month |
|---|---|---|
| Compute (K8s, AWS) | 5 pods @ 1 vCPU / 2 GB + LB | ~$380 |
| Token spend (Anthropic, 50%) | 100502022 * 0.5 * ((3000/1M)$15+(800/1M)*$75) | ~$5,280 |
| Token spend (OpenAI, 30%) | 100502022 * 0.3 * ((3000/1M)$10+(800/1M)*$30) | ~$1,075 |
| Storage (S3, 200 GB) | ~$6 | |
| Observability (managed OTel) | ~$120 | |
| Total | ~$6,860/mo |
Cost optimization suggestions:
- Move 50% of exploration-phase traffic to Ollama (70B model on a shared GPU node) - projected savings ~$2,600/mo.
- Enable scenario caching - deduplicates identical runs; typical hit rate 20-30%.
Worked scenario C: Enterprise (1,000+ users)¶
Profile
- 1,500 users, 30 simulations/day each
- 25 steps/sim, 4,000 in + 1,200 out per step
- Provider mix: 20% Claude Opus, 10% OpenAI, 70% local vLLM (8x H100)
- Deployment: Kubernetes HA, 10 backend replicas, vLLM cluster, multi-region
Monthly cost
| Line | Calculation | $/month |
|---|---|---|
| Compute - FCC pods | 20 pods x 2 vCPU / 4 GB | ~$1,900 |
| Compute - vLLM (8x H100, AWS) | 730 hrs x $32/hr | ~$23,400 |
| Token spend (Claude, 20%) | 1500302522 * 0.2 * ((4000/1M)$15+(1200/1M)*$75) | ~$37,100 |
| Token spend (OpenAI, 10%) | 1500302522 * 0.1 * ((4000/1M)$10+(1200/1M)*$30) | ~$7,920 |
| Storage (S3 + long-term) | 10 TB | ~$230 |
| Observability (managed) | ~$1,200 | |
| Support staff (2 FTE platform) | Amortized loaded cost | ~$35,000 |
| Total | ~$106,750/mo |
Annualized: ~$1.28M for 1,500 daily-active users = ~$71/user/month.
Cost optimization suggestions:
- Shift to on-prem vLLM: swaps opex for capex; typical 12-18 month payback.
- Aggressive scenario caching + embedding cache: 10-20% token reduction.
- Tier workloads: use cheaper models for FIND phase, premium for CRITIQUE.
Cost optimization strategies¶
| Strategy | Typical savings | Effort |
|---|---|---|
| Mock mode in CI | 100% of CI token cost | Low (default) |
| Ollama/vLLM for dev + exploration | 60-90% tokens | Medium |
| Scenario deduplication cache | 15-30% | Medium |
| Embedding + retrieval cache | 10-20% | Medium |
| Workflow pruning (smaller graph when tolerable) | 20-50% | Low |
| Temperature=0 + seed-based reuse | 5-15% | Low |
| LiteLLM routing to cheapest-capable model | 10-30% | Medium |
Comparison vs commercial agent frameworks¶
| Framework | Annual license | TCO for scenario B | Notes |
|---|---|---|---|
| FCC (open source MIT) | $0 | ~$82k/yr | Self-hosted, full control |
| Commercial Framework X | $80-200k | ~$160-300k/yr | Platform fee + per-seat |
| Commercial Framework Y | $120-400k | ~$220-500k/yr | Usage-based on top of platform fee |
FCC's "cost" is the engineering hours to run it. For organizations already running modern Python stacks on Kubernetes, that marginal cost is small.
ROI template¶
Estimate ROI over one year:
value_delivered = hours_saved_per_user_month
* 12
* avg_loaded_hourly_rate
* num_users
* adoption_rate
total_cost = annual_fcc_tco
roi = (value_delivered - total_cost) / total_cost
Example (Scenario B, 100 users, 3 hrs saved/user/mo, $80/hr loaded, 70% adoption):
value_delivered = 3 * 12 * 80 * 100 * 0.7 = $201,600
total_cost = 6,860 * 12 = $82,320
roi = (201600 - 82320) / 82320 = 145%
A sub-1-year payback is realistic when FCC replaces ad-hoc prompt engineering with structured persona teams.
FinOps checklist¶
- Per-team tags on Kubernetes namespaces for chargeback
- Scenario-level token accounting via the event bus (
LLM_CALLevents) - Monthly cost report pipeline (events -> data warehouse -> dashboard)
- Budget alerts per provider
- Model-tier policy (which personas may use premium models)
- Scheduled caching audit (hit rate, stale entries)
- Annual TCO review + provider mix re-evaluation
Related resources¶
- Security review -- risk posture that complements this cost model
- Enterprise deployment -- deployment topologies referenced above
- Quickstart: enterprise -- zero-to-running checklist
- AI compliance -- regulatory costs (EU AI Act, NIST RMF)
charts/fcc/-- Helm chart used in scenarios B and Cdocker/Dockerfile.backend-- base image used for pod sizing above