Full-Stack Ecosystem -- v1.3.5.2 Addendum¶

This addendum extends the Full-Stack Ecosystem Demo with the v1.3.5.2 end-to-end scenario. It wires together the four v1.3.5.2 addendum scenarios -- Web Frontend stress test, Distiller Bridge vocabulary evolution, Open Science cross-project FAIR audit, and Sky-Parlour context-enricher integration -- under a single OpenTelemetry trace and a single event bus.

The goal is operational: validate that the four subsystems behave correctly when their failure and recovery cycles overlap, not just when exercised in isolation.

Scenario Overview¶

Topology¶

  +--------------------+       vocabulary.mismatch
  |  Distiller Bridge  |---------------------+
  +--------------------+                     |
            |                                v
            | workflow.step           +--------------+
            v                         | EventBus     |
  +--------------------+  subscribe   |  (central)   |
  |   Web Frontend     |<-------------|              |
  |   stress panel     |              +--------------+
  +--------------------+                ^         ^
            |                           |         |
            | trace (OTel)              |         |
            v                           |         |
  +--------------------+       fair.*   |         |   workflow.step
  |   Open Science     |----------------+         +----------------+
  |   FAIR audit       |                                           |
  +--------------------+                                           |
                                                                   v
                                                        +--------------------+
                                                        |   Sky-Parlour      |
                                                        |   context enricher |
                                                        +--------------------+

Participants¶

Subsystem	Role	Primary Event Contribution	Addendum
Web Frontend	Stress publisher + observer	`workflow.step` at ~1000/sec	web-frontend-v1352-addendum.md
Distiller Bridge	Vocabulary change source	`vocabulary.mismatch`	distiller-bridge-v1352-addendum.md
Open Science	Subscriber + auditor	`fair.`, `cross_project.`	open-science-v1352-addendum.md
Sky-Parlour	Downstream visualizer	consumes everything above	skyparlour-v1352-addendum.md

Reference Notebook¶

The scenario is captured as a runnable walkthrough in notebooks/32_full_stack_ecosystem_demo.ipynb. The notebook imports each subsystem's setup helper, registers subscribers in the correct order, drives the scenario from the top of the topology down, and records metrics and spans for post-run inspection. Use the notebook as the authoritative starting point; the sections below describe the same run at prose level.

Sequencing the End-to-End Run¶

The scenario runs in four overlapping phases. Each phase introduces events that the downstream phases consume.

Phase A -- Quiescent Baseline (t+0 to t+5 s)¶

The event bus is started, all four subsystems attach their subscribers, and the OTel tracer is initialized. No synthetic traffic yet. The baseline captures idle metrics so that later deltas are interpretable.

Phase B -- Vocabulary Evolution (t+5 s to t+10 s)¶

The Distiller Bridge receives a Fornax class map with two additions and one removal (see the Distiller Bridge addendum for the exact deltas). The bridge emits three vocabulary.mismatch events. The Open Science subscriber schedules a cross-project reconciliation check and begins collecting evidence.

Phase C -- Stress Burst (t+10 s to t+20 s)¶

The Web Frontend stress harness begins publishing workflow.step events at 1000/sec for 10 seconds. Sky-Parlour's bridge forwards the events to the Phase-17 enricher. Open Science remains subscribed; its fair.check cadence is set low enough (every 30 seconds) that the burst does not trigger a duplicate check.

Phase D -- Cross-Project FAIR Audit (t+20 s to t+35 s)¶

With the vocabulary change reconciled and the stress burst concluded, Open Science triggers a full assess_cross_project run against FCC, PAOM, and AOME. Sky-Parlour renders the resulting fair.principle. evaluated events live. The final cross_project.assessment.completed event carries the headline scores.

OpenTelemetry Trace Layout¶

A single root span -- ecosystem.v1352.run -- wraps the entire scenario. Each phase and each subsystem adds nested spans. The resulting trace is self-contained: operators can filter by span name to zoom into any subsystem.

Top-Level Span Hierarchy¶

Level	Span Name	Parent	Typical Duration
0	`ecosystem.v1352.run`	(root)	~35 s
1	`phase.baseline`	root	5 s
1	`phase.vocab_evolution`	root	5 s
1	`phase.stress_burst`	root	10 s
1	`phase.fair_audit`	root	15 s

Subsystem-Level Span Names¶

Subsystem	Span Name	Emitted Per
Event bus	`event.publish`	Event
Subscribers	`subscriber.invoke`	Delivery
Action engine	`workflow.step`	Step
Vocabulary loader	`vocab.verify`	Verification run
Compliance	`compliance.reaudit`	Affected requirement
FAIR audit	`fair.principle.evaluate`	Principle per project
Cross-project	`federation.assess`	Assessment run
Visualization	`viz.bridge.forward`	Forwarded payload

Correlating Cross-Subsystem Spans¶

Every event carries a trace_id attribute generated at publish time by the event bus. Subscribers attach their spans to that trace ID, which means a single event produced in phase B can be followed through its reception in phase C (Sky-Parlour) and its contribution to phase D (Open Science). The trace viewer is the correct tool for walking this path; logs alone are insufficient because the bus is asynchronous.

Metric Names and Healthy Ranges¶

The scenario emits the following metrics. All are standard FccMetrics types.

Metric	Type	Phase Where Relevant	Healthy Range
`events.published`	Counter	all	rising monotonically
`events.delivered`	Counter	all	within 1% of `published`
`events.dropped`	Counter	C	< 1% of published
`events.dlq.size`	Gauge	B, C	< 100
`vocab.verify.diff.count`	Counter	B	matches injected delta (3)
`compliance.reaudit.scheduled`	Counter	B	matches affected requirements
`fair.principle.score`	Histogram	D	centred above 0.80
`federation.cross_project_score`	Gauge	D	>= 0.85 for reference run
`viz.bridge.delivered`	Counter	C, D	tracks publishes to Sky-Parlour
`viz.bridge.delivered.latency_ms`	Histogram	C, D	p95 < 50 ms

Subscriber Ordering¶

The subscriber set is registered in a specific order to keep the scenario deterministic. Do not reorder these unless the scenario is being deliberately modified.

ComplianceSubscriber on vocabulary.mismatch.
FAIRAuditSubscriber on compliance.reaudit.scheduled.
VisualizationBridge subscribing to the bus (broad filter).
StressPanelSubscriber on events.*.
CrossProjectReporter on cross_project.assessment.completed.

Ordering matters because (1) must see the mismatch before (2) is asked to schedule, and (3) must be attached before (4) starts publishing so that the enricher receives the stress traffic.

Full Run -- Minimal Python¶

The following is the condensed form of the notebook's setup cell. It is not a substitute for the notebook, which adds inspection tooling between phases.

from fcc.messaging.bus import EventBus
from fcc.observability.tracing import FccTracer
from fcc.observability.metrics import FccMetrics
from fcc.compliance.subscriber import ComplianceSubscriber
from fcc.visualization.bridge import VisualizationBridge
from fcc.objectmodel.federation import assess_cross_project
from fcc.objectmodel.examples import create_sample_model

bus     = EventBus.default()
tracer  = FccTracer.default()
metrics = FccMetrics.default()

with tracer.start_as_root("ecosystem.v1352.run"):
    # Phase A -- baseline
    with tracer.start_as_current("phase.baseline"):
        bus.subscribe("vocabulary.mismatch", ComplianceSubscriber().handle)
        bridge = VisualizationBridge.default()
        bus.subscribe_all(lambda ev: bridge.forward(ev, target="skyparlour"))

    # Phase B -- vocabulary evolution
    with tracer.start_as_current("phase.vocab_evolution"):
        run_distiller_vocab_scenario()

    # Phase C -- stress burst
    with tracer.start_as_current("phase.stress_burst"):
        run_stress_harness(events_per_second=1000, duration_s=10)

    # Phase D -- FAIR audit
    with tracer.start_as_current("phase.fair_audit"):
        fcc_f  = create_sample_model("fcc")
        paom_f = create_sample_model("paom")
        aome_f = create_sample_model("aome")
        assessment = assess_cross_project([fcc_f, paom_f, aome_f])
        bus.publish_event("cross_project.assessment.completed",
                          payload=assessment.to_dict())

Reference Run Results¶

The reference run on the v1.3.5.2 developer machine produced the following headline numbers. Treat these as rough expectations rather than fixed targets -- they are hardware- and load-dependent.

Metric	Value
Total events published	~10,030
Events delivered	~9,995
Events dropped	35 (Sky-Parlour stress backpressure)
DLQ peak depth	12
Vocabulary mismatch events	3
Compliance requirements re-audited	4
FAIR principles evaluated	30 (10 x 3 projects)
`cross_project_score`	0.89
Run wall-clock time	~35 s
Root span count	1
Total span count	~11,800

Failure Modes and Recovery¶

The scenario is designed to surface the common failure modes that occur in production when these subsystems are deployed together.

Failure Mode	Where Introduced	Expected Recovery Path
Slow Sky-Parlour enricher	Phase C stress burst	`events.dropped` rises; scenario continues
Invalid vocabulary YAML	Phase B	Loader raises, scenario aborts; fix YAML and re-run
Compliance subscriber exception	Phase B handler	DLQ captures the event; scenario continues
Cross-project facade unreachable	Phase D	`assess_cross_project` returns partial result with recommendations
OTel exporter failure	any phase	Spans dropped silently; metrics still recorded

Observability Checklist for Operators¶

When reviewing a completed run, an operator should look at the following indicators in order. Each has a single clear "healthy" or "unhealthy" answer.

Was the root span emitted? (trace viewer)
Did events.published and events.delivered stay within 1%? (metrics)
Was DLQ depth stable and bounded? (gauge)
Did vocabulary mismatch trigger the expected re-audits? (counter match)
Did federation.cross_project_score exceed the configured threshold? (gauge)
Did Sky-Parlour receive payloads throughout the stress phase? (viz metric)
Are there any spans with unexpectedly long durations? (trace viewer)

If any answer is "no", the per-subsystem addendum contains the detailed diagnostic procedure for that subsystem.

Tips¶

Capture a clean baseline run before introducing any workload. The delta between baseline and full run is far more useful than a full run in isolation.
Use the trace ID from a single representative event to walk the full cross-subsystem path. This is the highest-leverage debugging technique in the ecosystem.
The reference notebook commits its run artefacts under notebooks/_runs/; keep those committed so that regressions show up in code review rather than production.
Align OTel clocks across subsystems; even a 100 ms skew makes trace layout confusing.