Skip to content

Distiller Bridge -- v1.3.5.2 Addendum

This addendum extends the Distiller Bridge Demo, the Phase 14 Addendum, and the Phase 15 Addendum with the v1.3.5.2 NanoCube vocabulary-evolution scenario. The scenario demonstrates what happens when a sister project -- distiller_ex (codename Fornax) -- ships a schema update and the FCC side has to detect, reconcile, and audit the change.


Scenario Overview

The scenario has four phases:

Phase Activity Artefact Changed
1. Detect FCC loads the plugin and compares class map against YAML In-memory diff report
2. Reconcile Operator updates the packaged YAML mapping src/fcc/data/objectmodel/distiller_vocabulary_mappings.yaml
3. Emit Bus publishes vocabulary.mismatch events EventBus log, ComplianceSubscriber audit record
4. Verify Re-run loader; coverage returns to 100% Diff report empty

The Vocabulary Contract

Sister projects do not ship runtime imports into FCC. Instead they expose a VocabularyProviderPlugin that declares the entity classes they own and the string IDs FCC should map to them. The plugin is the single point of truth at runtime; the packaged YAML under src/fcc/data/objectmodel/ is the persistent, reviewable contract.

# Abbreviated plugin contract -- see src/fcc/plugins/base.py
class VocabularyProviderPlugin(ABC):
    @abstractmethod
    def get_class_map(self) -> dict[str, type]:
        """Return {local_id: class} for every entity this project owns."""

Fornax registers a concrete implementation -- DistillerVocabProvider -- that returns roughly 175 entries covering NanoCube, Slice, Dimension, Measure, and a number of derived aggregation classes.


Phase 1 -- Detect

Loading the Current Mapping

The VocabularyMappingLoader reads the packaged YAML and then diffs the live plugin class map against it. The diff is a simple frozen dataclass that lists additions (IDs in the plugin but not in the YAML) and removals (IDs in the YAML but not in the plugin).

from fcc.objectmodel.vocabulary_loader import VocabularyMappingLoader
from fcc.plugins import load_plugin

loader = VocabularyMappingLoader()
store = loader.load_project("distiller")  # reads distiller_vocabulary_mappings.yaml

plugin = load_plugin("distiller_ex.DistillerVocabProvider")
result = loader.verify_against_plugin(store, plugin)

print("Missing from YAML:", result.additions)    # Plugin has it, YAML doesn't
print("Stale in YAML:",   result.removals)       # YAML has it, plugin doesn't
print("Coverage:",        result.coverage_ratio)  # 0.0 -- 1.0

A Typical Fornax Update

A realistic NanoCube schema update might touch 3-12 IDs per release. The v1.3.5.2 reference run uses a synthetic update that adds two new classes and removes one deprecated class:

Change Local ID Python Class Reason
Addition nano_slice_sparse distiller_ex.models.SparseSlice New sparse representation
Addition nano_measure_percentile distiller_ex.models.PercentileMeasure New aggregation type
Removal nano_legacy_cube (removed from plugin) Deprecated since Fornax v0.9

Detect-Time Coverage

Before reconciliation, the loader reports:

Metric Value
Plugin class map size 176
YAML mapping entries 175
Additions detected 2
Removals detected 1
Coverage ratio 0.983

Phase 2 -- Reconcile

Editing the YAML

Reconciliation is a direct edit of the packaged YAML. Each entry in the file is a small block with five fields; additions are appended to the relevant section and removals are deleted.

# src/fcc/data/objectmodel/distiller_vocabulary_mappings.yaml
mappings:
  - local_id: nano_slice_sparse
    class_path: distiller_ex.models.SparseSlice
    category: slice
    confidence: 1.00
    source: distiller_ex@1.4.0

  - local_id: nano_measure_percentile
    class_path: distiller_ex.models.PercentileMeasure
    category: measure
    confidence: 1.00
    source: distiller_ex@1.4.0

# (the nano_legacy_cube entry is deleted)

Re-verifying

Running verify_against_plugin after the edit should produce an empty diff and a coverage ratio of 1.0. If it does not, the most common cause is a typo in class_path -- the loader imports the class lazily so a typo is only caught on verification, not on YAML parse.

Reconciliation Checklist

Step Check
1 New IDs appended with confidence: 1.00
2 source tag updated to reflect new sister-project version
3 Removed IDs fully deleted (no commented-out entries)
4 Categories match the existing taxonomy
5 make test green, make lint green

Phase 3 -- Emit

Mismatch Events

Every diff surfaced in Phase 1 produces a vocabulary.mismatch event on the bus. The event carries enough context for a downstream auditor to reconstruct what changed without re-reading the YAML.

Event Field Example
event_type vocabulary.mismatch
source vocabulary_loader
payload.project distiller
payload.additions ["nano_slice_sparse", "nano_measure_percentile"]
payload.removals ["nano_legacy_cube"]
payload.coverage_before 0.983
payload.plugin_source distiller_ex@1.4.0

ComplianceSubscriber Handling

The compliance subscriber (src/fcc/compliance/subscriber.py) listens for vocabulary.mismatch and performs two actions:

  1. Record an audit-log entry with the full payload and a UTC timestamp.
  2. Schedule a re-audit of any compliance requirement that references the affected entity classes.

The audit log is appended to the evidence graph so that a subsequent compliance report can reference the vocabulary change as provenance for why a re-audit was triggered.

from fcc.compliance.subscriber import ComplianceSubscriber
from fcc.messaging.bus import EventBus

bus = EventBus.default()
subscriber = ComplianceSubscriber()
bus.subscribe("vocabulary.mismatch", subscriber.handle)

Replay and Traceability

Because events are captured by the standard EventSerializer, the entire reconciliation session can be replayed from a JSON log. This is the recommended way to test the subscriber in isolation:

from fcc.messaging.serialization import EventReplay

replay = EventReplay.from_json_file("./tests/fixtures/vocab_mismatch_run.json")
replay.into_bus(bus)   # subscribers receive the same sequence

Phase 4 -- Verify

Before / After Coverage

After applying the YAML edits from Phase 2 and re-running the loader, coverage returns to 100%.

Metric Before After
Plugin class map size 176 176
YAML mapping entries 175 176
Additions detected 2 0
Removals detected 1 0
Coverage ratio 0.983 1.000
vocabulary.mismatch events emitted 3 0

Audit Evidence

The compliance evidence graph now contains a node for each recorded mismatch event with edges back to:

  • The persona categories that reference the affected entity classes.
  • The compliance requirements scheduled for re-audit.
  • The YAML commit SHA (if FCC_EVIDENCE_GIT=1 is set) that closed the mismatch.

Operational Guidance

When to Run the Loader

The recommended cadence is:

Trigger Action
Sister-project release tag Full verify + reconcile cycle
FCC pre-merge CI make verify-vocabularies target (fail on diff)
Pre-release gate Regenerate evidence graph
Manual audit fcc vocabulary verify --project distiller

CI Integration

A dedicated target runs the verification headlessly in CI. Failure modes are explicit and reportable.

make verify-vocabularies
# Runs loader.verify_against_plugin across all 12 sister projects.
# Exits non-zero if any project reports a non-empty diff.

Troubleshooting

Symptom Probable Cause Next Step
Loader import error on class_path Typo or sister project not installed pip install -e path/to/distiller_ex and re-run
Coverage ratio below 1.0 after edit YAML still missing a new ID Compare result.additions with diff and append missing block
Compliance subscriber not triggered Subscriber not registered on the bus Ensure ComplianceSubscriber is subscribed to vocabulary.mismatch
Evidence graph stale Re-audit not scheduled Explicitly call CompliancePipeline.run_affected()
Recurring removals across releases Sister project deprecated without a migration note Coordinate with sister-project owner before editing YAML

Tips

  • Treat the YAML as a reviewable contract. A pull request that edits a vocabulary mapping should link to the upstream sister-project commit that drove the change.
  • Prefer atomic commits: one PR per sister-project version bump. Mixed reconciliations are difficult to audit later.
  • Use confidence: 1.00 only for direct class-name matches. Any inferred or fuzzy mapping should start below 0.90 and be manually reviewed.

See also