Distiller Bridge -- v1.3.5.2 Addendum¶

This addendum extends the Distiller Bridge Demo, the Phase 14 Addendum, and the Phase 15 Addendum with the v1.3.5.2 NanoCube vocabulary-evolution scenario. The scenario demonstrates what happens when a sister project -- distiller_ex (codename Fornax) -- ships a schema update and the FCC side has to detect, reconcile, and audit the change.

Scenario Overview¶

The scenario has four phases:

Phase	Activity	Artefact Changed
1. Detect	FCC loads the plugin and compares class map against YAML	In-memory diff report
2. Reconcile	Operator updates the packaged YAML mapping	`src/fcc/data/objectmodel/distiller_vocabulary_mappings.yaml`
3. Emit	Bus publishes `vocabulary.mismatch` events	`EventBus` log, `ComplianceSubscriber` audit record
4. Verify	Re-run loader; coverage returns to 100%	Diff report empty

The Vocabulary Contract¶

Sister projects do not ship runtime imports into FCC. Instead they expose a VocabularyProviderPlugin that declares the entity classes they own and the string IDs FCC should map to them. The plugin is the single point of truth at runtime; the packaged YAML under src/fcc/data/objectmodel/ is the persistent, reviewable contract.

# Abbreviated plugin contract -- see src/fcc/plugins/base.py
class VocabularyProviderPlugin(ABC):
    @abstractmethod
    def get_class_map(self) -> dict[str, type]:
        """Return {local_id: class} for every entity this project owns."""

Fornax registers a concrete implementation -- DistillerVocabProvider -- that returns roughly 175 entries covering NanoCube, Slice, Dimension, Measure, and a number of derived aggregation classes.

Phase 1 -- Detect¶

Loading the Current Mapping¶

The VocabularyMappingLoader reads the packaged YAML and then diffs the live plugin class map against it. The diff is a simple frozen dataclass that lists additions (IDs in the plugin but not in the YAML) and removals (IDs in the YAML but not in the plugin).

from fcc.objectmodel.vocabulary_loader import VocabularyMappingLoader
from fcc.plugins import load_plugin

loader = VocabularyMappingLoader()
store = loader.load_project("distiller")  # reads distiller_vocabulary_mappings.yaml

plugin = load_plugin("distiller_ex.DistillerVocabProvider")
result = loader.verify_against_plugin(store, plugin)

print("Missing from YAML:", result.additions)    # Plugin has it, YAML doesn't
print("Stale in YAML:",   result.removals)       # YAML has it, plugin doesn't
print("Coverage:",        result.coverage_ratio)  # 0.0 -- 1.0

A Typical Fornax Update¶

A realistic NanoCube schema update might touch 3-12 IDs per release. The v1.3.5.2 reference run uses a synthetic update that adds two new classes and removes one deprecated class:

Change	Local ID	Python Class	Reason
Addition	`nano_slice_sparse`	`distiller_ex.models.SparseSlice`	New sparse representation
Addition	`nano_measure_percentile`	`distiller_ex.models.PercentileMeasure`	New aggregation type
Removal	`nano_legacy_cube`	(removed from plugin)	Deprecated since Fornax v0.9

Detect-Time Coverage¶

Before reconciliation, the loader reports:

Metric	Value
Plugin class map size	176
YAML mapping entries	175
Additions detected	2
Removals detected	1
Coverage ratio	0.983

Phase 2 -- Reconcile¶

Editing the YAML¶

Reconciliation is a direct edit of the packaged YAML. Each entry in the file is a small block with five fields; additions are appended to the relevant section and removals are deleted.

# src/fcc/data/objectmodel/distiller_vocabulary_mappings.yaml
mappings:
  - local_id: nano_slice_sparse
    class_path: distiller_ex.models.SparseSlice
    category: slice
    confidence: 1.00
    source: distiller_ex@1.4.0

  - local_id: nano_measure_percentile
    class_path: distiller_ex.models.PercentileMeasure
    category: measure
    confidence: 1.00
    source: distiller_ex@1.4.0

# (the nano_legacy_cube entry is deleted)

Re-verifying¶

Running verify_against_plugin after the edit should produce an empty diff and a coverage ratio of 1.0. If it does not, the most common cause is a typo in class_path -- the loader imports the class lazily so a typo is only caught on verification, not on YAML parse.

Reconciliation Checklist¶

Step	Check
1	New IDs appended with `confidence: 1.00`
2	`source` tag updated to reflect new sister-project version
3	Removed IDs fully deleted (no commented-out entries)
4	Categories match the existing taxonomy
5	`make test` green, `make lint` green

Phase 3 -- Emit¶

Mismatch Events¶

Every diff surfaced in Phase 1 produces a vocabulary.mismatch event on the bus. The event carries enough context for a downstream auditor to reconstruct what changed without re-reading the YAML.

Event Field	Example
`event_type`	`vocabulary.mismatch`
`source`	`vocabulary_loader`
`payload.project`	`distiller`
`payload.additions`	`["nano_slice_sparse", "nano_measure_percentile"]`
`payload.removals`	`["nano_legacy_cube"]`
`payload.coverage_before`	`0.983`
`payload.plugin_source`	`distiller_ex@1.4.0`

ComplianceSubscriber Handling¶

The compliance subscriber (src/fcc/compliance/subscriber.py) listens for vocabulary.mismatch and performs two actions:

Record an audit-log entry with the full payload and a UTC timestamp.
Schedule a re-audit of any compliance requirement that references the affected entity classes.

The audit log is appended to the evidence graph so that a subsequent compliance report can reference the vocabulary change as provenance for why a re-audit was triggered.

from fcc.compliance.subscriber import ComplianceSubscriber
from fcc.messaging.bus import EventBus

bus = EventBus.default()
subscriber = ComplianceSubscriber()
bus.subscribe("vocabulary.mismatch", subscriber.handle)

Replay and Traceability¶

Because events are captured by the standard EventSerializer, the entire reconciliation session can be replayed from a JSON log. This is the recommended way to test the subscriber in isolation:

from fcc.messaging.serialization import EventReplay

replay = EventReplay.from_json_file("./tests/fixtures/vocab_mismatch_run.json")
replay.into_bus(bus)   # subscribers receive the same sequence

Phase 4 -- Verify¶

Before / After Coverage¶

After applying the YAML edits from Phase 2 and re-running the loader, coverage returns to 100%.

Metric	Before	After
Plugin class map size	176	176
YAML mapping entries	175	176
Additions detected	2	0
Removals detected	1	0
Coverage ratio	0.983	1.000
`vocabulary.mismatch` events emitted	3	0

Audit Evidence¶

The compliance evidence graph now contains a node for each recorded mismatch event with edges back to:

The persona categories that reference the affected entity classes.
The compliance requirements scheduled for re-audit.
The YAML commit SHA (if FCC_EVIDENCE_GIT=1 is set) that closed the mismatch.

Operational Guidance¶

When to Run the Loader¶

The recommended cadence is:

Trigger	Action
Sister-project release tag	Full verify + reconcile cycle
FCC pre-merge CI	`make verify-vocabularies` target (fail on diff)
Pre-release gate	Regenerate evidence graph
Manual audit	`fcc vocabulary verify --project distiller`

CI Integration¶

A dedicated target runs the verification headlessly in CI. Failure modes are explicit and reportable.

make verify-vocabularies
# Runs loader.verify_against_plugin across all 12 sister projects.
# Exits non-zero if any project reports a non-empty diff.

Troubleshooting¶

Symptom	Probable Cause	Next Step
Loader import error on `class_path`	Typo or sister project not installed	`pip install -e path/to/distiller_ex` and re-run
Coverage ratio below 1.0 after edit	YAML still missing a new ID	Compare `result.additions` with diff and append missing block
Compliance subscriber not triggered	Subscriber not registered on the bus	Ensure `ComplianceSubscriber` is subscribed to `vocabulary.mismatch`
Evidence graph stale	Re-audit not scheduled	Explicitly call `CompliancePipeline.run_affected()`
Recurring removals across releases	Sister project deprecated without a migration note	Coordinate with sister-project owner before editing YAML

Tips¶

Treat the YAML as a reviewable contract. A pull request that edits a vocabulary mapping should link to the upstream sister-project commit that drove the change.
Prefer atomic commits: one PR per sister-project version bump. Mixed reconciliations are difficult to audit later.
Use confidence: 1.00 only for direct class-name matches. Any inferred or fuzzy mapping should start below 0.90 and be manually reviewed.