Skip to content

Full-Stack Ecosystem -- v1.3.5.2 Addendum

This addendum extends the Full-Stack Ecosystem Demo with the v1.3.5.2 end-to-end scenario. It wires together the four v1.3.5.2 addendum scenarios -- Web Frontend stress test, Distiller Bridge vocabulary evolution, Open Science cross-project FAIR audit, and Sky-Parlour context-enricher integration -- under a single OpenTelemetry trace and a single event bus.

The goal is operational: validate that the four subsystems behave correctly when their failure and recovery cycles overlap, not just when exercised in isolation.


Scenario Overview

Topology

  +--------------------+       vocabulary.mismatch
  |  Distiller Bridge  |---------------------+
  +--------------------+                     |
            |                                v
            | workflow.step           +--------------+
            v                         | EventBus     |
  +--------------------+  subscribe   |  (central)   |
  |   Web Frontend     |<-------------|              |
  |   stress panel     |              +--------------+
  +--------------------+                ^         ^
            |                           |         |
            | trace (OTel)              |         |
            v                           |         |
  +--------------------+       fair.*   |         |   workflow.step
  |   Open Science     |----------------+         +----------------+
  |   FAIR audit       |                                           |
  +--------------------+                                           |
                                                                   v
                                                        +--------------------+
                                                        |   Sky-Parlour      |
                                                        |   context enricher |
                                                        +--------------------+

Participants

Subsystem Role Primary Event Contribution Addendum
Web Frontend Stress publisher + observer workflow.step at ~1000/sec web-frontend-v1352-addendum.md
Distiller Bridge Vocabulary change source vocabulary.mismatch distiller-bridge-v1352-addendum.md
Open Science Subscriber + auditor fair.*, cross_project.* open-science-v1352-addendum.md
Sky-Parlour Downstream visualizer consumes everything above skyparlour-v1352-addendum.md

Reference Notebook

The scenario is captured as a runnable walkthrough in notebooks/32_full_stack_ecosystem_demo.ipynb. The notebook imports each subsystem's setup helper, registers subscribers in the correct order, drives the scenario from the top of the topology down, and records metrics and spans for post-run inspection. Use the notebook as the authoritative starting point; the sections below describe the same run at prose level.


Sequencing the End-to-End Run

The scenario runs in four overlapping phases. Each phase introduces events that the downstream phases consume.

Phase A -- Quiescent Baseline (t+0 to t+5 s)

The event bus is started, all four subsystems attach their subscribers, and the OTel tracer is initialized. No synthetic traffic yet. The baseline captures idle metrics so that later deltas are interpretable.

Phase B -- Vocabulary Evolution (t+5 s to t+10 s)

The Distiller Bridge receives a Fornax class map with two additions and one removal (see the Distiller Bridge addendum for the exact deltas). The bridge emits three vocabulary.mismatch events. The Open Science subscriber schedules a cross-project reconciliation check and begins collecting evidence.

Phase C -- Stress Burst (t+10 s to t+20 s)

The Web Frontend stress harness begins publishing workflow.step events at 1000/sec for 10 seconds. Sky-Parlour's bridge forwards the events to the Phase-17 enricher. Open Science remains subscribed; its fair.check cadence is set low enough (every 30 seconds) that the burst does not trigger a duplicate check.

Phase D -- Cross-Project FAIR Audit (t+20 s to t+35 s)

With the vocabulary change reconciled and the stress burst concluded, Open Science triggers a full assess_cross_project run against FCC, PAOM, and AOME. Sky-Parlour renders the resulting fair.principle. evaluated events live. The final cross_project.assessment.completed event carries the headline scores.


OpenTelemetry Trace Layout

A single root span -- ecosystem.v1352.run -- wraps the entire scenario. Each phase and each subsystem adds nested spans. The resulting trace is self-contained: operators can filter by span name to zoom into any subsystem.

Top-Level Span Hierarchy

Level Span Name Parent Typical Duration
0 ecosystem.v1352.run (root) ~35 s
1 phase.baseline root 5 s
1 phase.vocab_evolution root 5 s
1 phase.stress_burst root 10 s
1 phase.fair_audit root 15 s

Subsystem-Level Span Names

Subsystem Span Name Emitted Per
Event bus event.publish Event
Subscribers subscriber.invoke Delivery
Action engine workflow.step Step
Vocabulary loader vocab.verify Verification run
Compliance compliance.reaudit Affected requirement
FAIR audit fair.principle.evaluate Principle per project
Cross-project federation.assess Assessment run
Visualization viz.bridge.forward Forwarded payload

Correlating Cross-Subsystem Spans

Every event carries a trace_id attribute generated at publish time by the event bus. Subscribers attach their spans to that trace ID, which means a single event produced in phase B can be followed through its reception in phase C (Sky-Parlour) and its contribution to phase D (Open Science). The trace viewer is the correct tool for walking this path; logs alone are insufficient because the bus is asynchronous.


Metric Names and Healthy Ranges

The scenario emits the following metrics. All are standard FccMetrics types.

Metric Type Phase Where Relevant Healthy Range
events.published Counter all rising monotonically
events.delivered Counter all within 1% of published
events.dropped Counter C < 1% of published
events.dlq.size Gauge B, C < 100
vocab.verify.diff.count Counter B matches injected delta (3)
compliance.reaudit.scheduled Counter B matches affected requirements
fair.principle.score Histogram D centred above 0.80
federation.cross_project_score Gauge D >= 0.85 for reference run
viz.bridge.delivered Counter C, D tracks publishes to Sky-Parlour
viz.bridge.delivered.latency_ms Histogram C, D p95 < 50 ms

Subscriber Ordering

The subscriber set is registered in a specific order to keep the scenario deterministic. Do not reorder these unless the scenario is being deliberately modified.

  1. ComplianceSubscriber on vocabulary.mismatch.
  2. FAIRAuditSubscriber on compliance.reaudit.scheduled.
  3. VisualizationBridge subscribing to the bus (broad filter).
  4. StressPanelSubscriber on events.*.
  5. CrossProjectReporter on cross_project.assessment.completed.

Ordering matters because (1) must see the mismatch before (2) is asked to schedule, and (3) must be attached before (4) starts publishing so that the enricher receives the stress traffic.


Full Run -- Minimal Python

The following is the condensed form of the notebook's setup cell. It is not a substitute for the notebook, which adds inspection tooling between phases.

from fcc.messaging.bus import EventBus
from fcc.observability.tracing import FccTracer
from fcc.observability.metrics import FccMetrics
from fcc.compliance.subscriber import ComplianceSubscriber
from fcc.visualization.bridge import VisualizationBridge
from fcc.objectmodel.federation import assess_cross_project
from fcc.objectmodel.examples import create_sample_model

bus     = EventBus.default()
tracer  = FccTracer.default()
metrics = FccMetrics.default()

with tracer.start_as_root("ecosystem.v1352.run"):
    # Phase A -- baseline
    with tracer.start_as_current("phase.baseline"):
        bus.subscribe("vocabulary.mismatch", ComplianceSubscriber().handle)
        bridge = VisualizationBridge.default()
        bus.subscribe_all(lambda ev: bridge.forward(ev, target="skyparlour"))

    # Phase B -- vocabulary evolution
    with tracer.start_as_current("phase.vocab_evolution"):
        run_distiller_vocab_scenario()

    # Phase C -- stress burst
    with tracer.start_as_current("phase.stress_burst"):
        run_stress_harness(events_per_second=1000, duration_s=10)

    # Phase D -- FAIR audit
    with tracer.start_as_current("phase.fair_audit"):
        fcc_f  = create_sample_model("fcc")
        paom_f = create_sample_model("paom")
        aome_f = create_sample_model("aome")
        assessment = assess_cross_project([fcc_f, paom_f, aome_f])
        bus.publish_event("cross_project.assessment.completed",
                          payload=assessment.to_dict())

Reference Run Results

The reference run on the v1.3.5.2 developer machine produced the following headline numbers. Treat these as rough expectations rather than fixed targets -- they are hardware- and load-dependent.

Metric Value
Total events published ~10,030
Events delivered ~9,995
Events dropped 35 (Sky-Parlour stress backpressure)
DLQ peak depth 12
Vocabulary mismatch events 3
Compliance requirements re-audited 4
FAIR principles evaluated 30 (10 x 3 projects)
cross_project_score 0.89
Run wall-clock time ~35 s
Root span count 1
Total span count ~11,800

Failure Modes and Recovery

The scenario is designed to surface the common failure modes that occur in production when these subsystems are deployed together.

Failure Mode Where Introduced Expected Recovery Path
Slow Sky-Parlour enricher Phase C stress burst events.dropped rises; scenario continues
Invalid vocabulary YAML Phase B Loader raises, scenario aborts; fix YAML and re-run
Compliance subscriber exception Phase B handler DLQ captures the event; scenario continues
Cross-project facade unreachable Phase D assess_cross_project returns partial result with recommendations
OTel exporter failure any phase Spans dropped silently; metrics still recorded

Observability Checklist for Operators

When reviewing a completed run, an operator should look at the following indicators in order. Each has a single clear "healthy" or "unhealthy" answer.

  1. Was the root span emitted? (trace viewer)
  2. Did events.published and events.delivered stay within 1%? (metrics)
  3. Was DLQ depth stable and bounded? (gauge)
  4. Did vocabulary mismatch trigger the expected re-audits? (counter match)
  5. Did federation.cross_project_score exceed the configured threshold? (gauge)
  6. Did Sky-Parlour receive payloads throughout the stress phase? (viz metric)
  7. Are there any spans with unexpectedly long durations? (trace viewer)

If any answer is "no", the per-subsystem addendum contains the detailed diagnostic procedure for that subsystem.


Tips

  • Capture a clean baseline run before introducing any workload. The delta between baseline and full run is far more useful than a full run in isolation.
  • Use the trace ID from a single representative event to walk the full cross-subsystem path. This is the highest-leverage debugging technique in the ecosystem.
  • The reference notebook commits its run artefacts under notebooks/_runs/; keep those committed so that regressions show up in code review rather than production.
  • Align OTel clocks across subsystems; even a 100 ms skew makes trace layout confusing.

See also