Observability¶
FCC ships with a structured tracing + metrics layer at
src/fcc/observability/
that plugs into OpenTelemetry when the [observability] extra is
installed (the v1.1.1 backend image includes it by default).
What gets emitted¶
Traces (OpenTelemetry spans)¶
The @traced decorator at fcc.observability.tracing.traced wraps
key framework operations:
| Operation | Span name | Attributes |
|---|---|---|
AISimulationEngine.run |
fcc.simulation.run |
scenario_id, start_node, max_steps |
ActionEngine.run |
fcc.action_engine.run |
action_type, persona_id |
| Each AI provider call | fcc.ai_client.complete |
provider, model, latency_ms, usage.total_tokens |
| Workflow node traversal | fcc.workflow.step |
step, node_id, actor |
| Plugin discovery | fcc.plugins.discover |
plugin_type, count |
Metrics (OpenTelemetry metrics)¶
Pre-defined metrics at fcc.observability.metrics.FccMetrics:
| Metric | Type | Labels |
|---|---|---|
fcc.simulation.duration |
Histogram (ms) | scenario_id, success |
fcc.simulation.ai_calls |
Counter | scenario_id, provider |
fcc.simulation.tokens |
Counter | scenario_id, provider, direction |
fcc.action_engine.actions |
Counter | action_type, persona_id |
fcc.collaboration.sessions |
Counter | outcome |
fcc.workflow.steps |
Counter | workflow_id |
fcc.plugin.load_duration |
Histogram (ms) | plugin_type |
Local dev: console exporter¶
The simplest way to see traces during development:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
# Now run any FCC simulation — spans print to stdout
from fcc.simulation.ai_client import AIClient
client = AIClient(provider="mock")
client.complete_simple("You are helpful.", "Hello!")
Kubernetes: OTLP → Tempo/Jaeger/Honeycomb¶
Set the OTLP endpoint env var in the Helm install:
helm upgrade fcc ./charts/fcc \
--reuse-values \
--set backend.env.OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability.svc.cluster.local:4318 \
--set backend.env.OTEL_SERVICE_NAME=fcc-backend \
--set backend.env.OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production
The OTLP Python exporter is bundled in the [observability] extras and
loaded lazily — if OTEL_EXPORTER_OTLP_ENDPOINT is unset, no exporter
is created and there's zero overhead.
Tempo (Grafana Labs)¶
# values.yaml for the tempo Helm chart in a separate release
tempo:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
Then set OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability:4318.
Jaeger¶
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/latest/download/jaeger-operator.yaml
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: fcc-jaeger
EOF
Then set OTEL_EXPORTER_OTLP_ENDPOINT=http://fcc-jaeger-collector:4318.
Honeycomb¶
helm upgrade fcc ./charts/fcc \
--reuse-values \
--set backend.env.OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io \
--set backend.env.OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=$HONEYCOMB_API_KEY \
--set backend.env.OTEL_SERVICE_NAME=fcc-backend
Prometheus metrics (pull model)¶
FCC does not expose a /metrics Prometheus endpoint by default — the
WebSocket bridge is the only HTTP-ish surface and it's already handling
/health. For Prometheus scraping, use the OTLP → Prometheus bridge
(e.g. the Alloy collector with otelcol.exporter.prometheus).
Alternatively, some adopters run a sidecar that scrapes the OpenTelemetry SDK's in-process metrics directly. That's outside the scope of the bundled Helm chart.
Log aggregation¶
FCC uses Python's stdlib logging module configured via the
FCC_LOG_LEVEL env var (default INFO). All 4 containers log to
stdout/stderr, so any log collector that scrapes container logs works:
- Loki (Grafana):
helm install loki grafana/loki-stack - Elastic (ELK): Filebeat DaemonSet
- Datadog: the Datadog Agent DaemonSet
- Fluent Bit:
fluent/fluent-bitDaemonSet
Structured logging with JSON format (coming in v1.2.0) will make parsing easier. Today's format is human-readable text.
Health endpoints¶
| Service | Endpoint | Protocol | Purpose |
|---|---|---|---|
| backend | /health |
HTTP on port 8765 | Liveness + readiness |
| frontend | / |
HTTP on port 80 | Liveness + readiness |
| streamlit | /_stcore/health |
HTTP on port 8501 | Streamlit's built-in |
| jupyter | /api |
HTTP on port 8888 | Jupyter's built-in |
The backend's /health is served by the websockets library's
process_request callback — it doesn't require a separate HTTP server
and has zero additional dependencies.
See also¶
- Security defaults
- Upgrade procedure
src/fcc/observability/— tracing + metrics source- OpenTelemetry Python docs