Skip to content

Observability

FCC ships with a structured tracing + metrics layer at src/fcc/observability/ that plugs into OpenTelemetry when the [observability] extra is installed (the v1.1.1 backend image includes it by default).

What gets emitted

Traces (OpenTelemetry spans)

The @traced decorator at fcc.observability.tracing.traced wraps key framework operations:

Operation Span name Attributes
AISimulationEngine.run fcc.simulation.run scenario_id, start_node, max_steps
ActionEngine.run fcc.action_engine.run action_type, persona_id
Each AI provider call fcc.ai_client.complete provider, model, latency_ms, usage.total_tokens
Workflow node traversal fcc.workflow.step step, node_id, actor
Plugin discovery fcc.plugins.discover plugin_type, count

Metrics (OpenTelemetry metrics)

Pre-defined metrics at fcc.observability.metrics.FccMetrics:

Metric Type Labels
fcc.simulation.duration Histogram (ms) scenario_id, success
fcc.simulation.ai_calls Counter scenario_id, provider
fcc.simulation.tokens Counter scenario_id, provider, direction
fcc.action_engine.actions Counter action_type, persona_id
fcc.collaboration.sessions Counter outcome
fcc.workflow.steps Counter workflow_id
fcc.plugin.load_duration Histogram (ms) plugin_type

Local dev: console exporter

The simplest way to see traces during development:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)

# Now run any FCC simulation — spans print to stdout
from fcc.simulation.ai_client import AIClient
client = AIClient(provider="mock")
client.complete_simple("You are helpful.", "Hello!")

Kubernetes: OTLP → Tempo/Jaeger/Honeycomb

Set the OTLP endpoint env var in the Helm install:

helm upgrade fcc ./charts/fcc \
  --reuse-values \
  --set backend.env.OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability.svc.cluster.local:4318 \
  --set backend.env.OTEL_SERVICE_NAME=fcc-backend \
  --set backend.env.OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production

The OTLP Python exporter is bundled in the [observability] extras and loaded lazily — if OTEL_EXPORTER_OTLP_ENDPOINT is unset, no exporter is created and there's zero overhead.

Tempo (Grafana Labs)

# values.yaml for the tempo Helm chart in a separate release
tempo:
  receivers:
    otlp:
      protocols:
        http:
          endpoint: 0.0.0.0:4318
        grpc:
          endpoint: 0.0.0.0:4317

Then set OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability:4318.

Jaeger

kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/latest/download/jaeger-operator.yaml
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: fcc-jaeger
EOF

Then set OTEL_EXPORTER_OTLP_ENDPOINT=http://fcc-jaeger-collector:4318.

Honeycomb

helm upgrade fcc ./charts/fcc \
  --reuse-values \
  --set backend.env.OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io \
  --set backend.env.OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=$HONEYCOMB_API_KEY \
  --set backend.env.OTEL_SERVICE_NAME=fcc-backend

Prometheus metrics (pull model)

FCC does not expose a /metrics Prometheus endpoint by default — the WebSocket bridge is the only HTTP-ish surface and it's already handling /health. For Prometheus scraping, use the OTLP → Prometheus bridge (e.g. the Alloy collector with otelcol.exporter.prometheus).

Alternatively, some adopters run a sidecar that scrapes the OpenTelemetry SDK's in-process metrics directly. That's outside the scope of the bundled Helm chart.

Log aggregation

FCC uses Python's stdlib logging module configured via the FCC_LOG_LEVEL env var (default INFO). All 4 containers log to stdout/stderr, so any log collector that scrapes container logs works:

  • Loki (Grafana): helm install loki grafana/loki-stack
  • Elastic (ELK): Filebeat DaemonSet
  • Datadog: the Datadog Agent DaemonSet
  • Fluent Bit: fluent/fluent-bit DaemonSet

Structured logging with JSON format (coming in v1.2.0) will make parsing easier. Today's format is human-readable text.

Health endpoints

Service Endpoint Protocol Purpose
backend /health HTTP on port 8765 Liveness + readiness
frontend / HTTP on port 80 Liveness + readiness
streamlit /_stcore/health HTTP on port 8501 Streamlit's built-in
jupyter /api HTTP on port 8888 Jupyter's built-in

The backend's /health is served by the websockets library's process_request callback — it doesn't require a separate HTTP server and has zero additional dependencies.

See also