Observability¶

FCC ships with a structured tracing + metrics layer at src/fcc/observability/ that plugs into OpenTelemetry when the [observability] extra is installed (the v1.1.1 backend image includes it by default).

What gets emitted¶

Traces (OpenTelemetry spans)¶

The @traced decorator at fcc.observability.tracing.traced wraps key framework operations:

Operation	Span name	Attributes
`AISimulationEngine.run`	`fcc.simulation.run`	`scenario_id`, `start_node`, `max_steps`
`ActionEngine.run`	`fcc.action_engine.run`	`action_type`, `persona_id`
Each AI provider call	`fcc.ai_client.complete`	`provider`, `model`, `latency_ms`, `usage.total_tokens`
Workflow node traversal	`fcc.workflow.step`	`step`, `node_id`, `actor`
Plugin discovery	`fcc.plugins.discover`	`plugin_type`, `count`

Metrics (OpenTelemetry metrics)¶

Pre-defined metrics at fcc.observability.metrics.FccMetrics:

Metric	Type	Labels
`fcc.simulation.duration`	Histogram (ms)	`scenario_id`, `success`
`fcc.simulation.ai_calls`	Counter	`scenario_id`, `provider`
`fcc.simulation.tokens`	Counter	`scenario_id`, `provider`, `direction`
`fcc.action_engine.actions`	Counter	`action_type`, `persona_id`
`fcc.collaboration.sessions`	Counter	`outcome`
`fcc.workflow.steps`	Counter	`workflow_id`
`fcc.plugin.load_duration`	Histogram (ms)	`plugin_type`

Local dev: console exporter¶

The simplest way to see traces during development:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)

# Now run any FCC simulation — spans print to stdout
from fcc.simulation.ai_client import AIClient
client = AIClient(provider="mock")
client.complete_simple("You are helpful.", "Hello!")

Kubernetes: OTLP → Tempo/Jaeger/Honeycomb¶

Set the OTLP endpoint env var in the Helm install:

helm upgrade fcc ./charts/fcc \
  --reuse-values \
  --set backend.env.OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability.svc.cluster.local:4318 \
  --set backend.env.OTEL_SERVICE_NAME=fcc-backend \
  --set backend.env.OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production

The OTLP Python exporter is bundled in the [observability] extras and loaded lazily — if OTEL_EXPORTER_OTLP_ENDPOINT is unset, no exporter is created and there's zero overhead.

Tempo (Grafana Labs)¶

# values.yaml for the tempo Helm chart in a separate release
tempo:
  receivers:
    otlp:
      protocols:
        http:
          endpoint: 0.0.0.0:4318
        grpc:
          endpoint: 0.0.0.0:4317

Then set OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability:4318.

Jaeger¶

kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/latest/download/jaeger-operator.yaml
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: fcc-jaeger
EOF

Then set OTEL_EXPORTER_OTLP_ENDPOINT=http://fcc-jaeger-collector:4318.

Honeycomb¶

helm upgrade fcc ./charts/fcc \
  --reuse-values \
  --set backend.env.OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io \
  --set backend.env.OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=$HONEYCOMB_API_KEY \
  --set backend.env.OTEL_SERVICE_NAME=fcc-backend

Prometheus metrics (pull model)¶

FCC does not expose a /metrics Prometheus endpoint by default — the WebSocket bridge is the only HTTP-ish surface and it's already handling /health. For Prometheus scraping, use the OTLP → Prometheus bridge (e.g. the Alloy collector with otelcol.exporter.prometheus).

Alternatively, some adopters run a sidecar that scrapes the OpenTelemetry SDK's in-process metrics directly. That's outside the scope of the bundled Helm chart.

Log aggregation¶

FCC uses Python's stdlib logging module configured via the FCC_LOG_LEVEL env var (default INFO). All 4 containers log to stdout/stderr, so any log collector that scrapes container logs works:

Loki (Grafana): helm install loki grafana/loki-stack
Elastic (ELK): Filebeat DaemonSet
Datadog: the Datadog Agent DaemonSet
Fluent Bit: fluent/fluent-bit DaemonSet

Structured logging with JSON format (coming in v1.2.0) will make parsing easier. Today's format is human-readable text.

Health endpoints¶

Service	Endpoint	Protocol	Purpose
backend	`/health`	HTTP on port 8765	Liveness + readiness
frontend	`/`	HTTP on port 80	Liveness + readiness
streamlit	`/_stcore/health`	HTTP on port 8501	Streamlit's built-in
jupyter	`/api`	HTTP on port 8888	Jupyter's built-in

The backend's /health is served by the websockets library's process_request callback — it doesn't require a separate HTTP server and has zero additional dependencies.