Skip to content

Distiller Bridge -- Phase 15 Addendum

This addendum extends the Distiller Bridge Demo and Phase 14 Addendum with Phase 15 unified vocabulary distillation capabilities that leverage the object model abstraction layer and federated knowledge graphs.


Unified Vocabulary Distillation

Overview

Phase 15 introduces unified vocabulary distillation -- a process that takes raw terminology from all ecosystem projects and produces a single, normalized vocabulary using the VocabularyMapping infrastructure. The Distiller bridge now supports bi-directional mapping between project-local terms and the unified FCC vocabulary.

Architecture

Project A Terms ─┐
Project B Terms ──┼── VocabularyMapping ── Unified FCC Vocabulary
Project C Terms ──┘         │
                            ├── ModelFacade cross-model search
                            ├── FederatedKG edge normalization
                            └── Compliance report term alignment

Distiller Integration Points

Vocabulary Extraction

The Distiller bridge extracts raw vocabulary from each project's domain model using the ModelFacade.stats() API:

from fcc.objectmodel.facade import ModelFacade

facade = get_project_facade("distiller")
stats = facade.stats()
print(f"Terms: {stats['total_terms']}")
print(f"Mapped: {stats['mapped_terms']}")

Term Normalization Pipeline

The normalization pipeline applies three stages:

Stage Description Example
Tokenization Split compound terms data_engineer -> data, engineer
Synonym Resolution Map synonyms to canonical form ML -> machine_learning
Namespace Qualification Add project prefix architect -> fcc:architect

Mapping Confidence

Each vocabulary mapping carries a confidence score:

Range Meaning Action
0.9 -- 1.0 Exact match Auto-apply
0.7 -- 0.9 High confidence Review recommended
0.5 -- 0.7 Partial match Manual review required
< 0.5 Low confidence Flagged for human review

Cross-Project Distillation

Federated Term Graph

The Distiller bridge now builds a federated term graph that connects vocabulary terms across project namespaces:

from fcc.knowledge.federation import FederatedKnowledgeGraph

fkg = FederatedKnowledgeGraph()
fkg.add_namespace("distiller", distiller_graph)
fkg.add_namespace("constel", constel_graph)

# Cross-namespace term edges are automatically resolved
cross_edges = fkg.cross_namespace_edges()
print(f"Cross-namespace term links: {len(cross_edges)}")

Vocabulary Coverage Report

A summary report shows distillation coverage:

Metric Value
Total unique terms 2,400+
Mapped to unified vocabulary 2,100+
Unmapped (pending review) 300
Cross-project synonyms resolved 180

Event Integration

Vocabulary distillation emits events through the EventBus:

Event Type Payload
vocabulary.extraction.started project, term_count
vocabulary.mapping.created source_term, target_term, confidence
vocabulary.distillation.completed total_mapped, total_unmapped

Tips

  • Run vocabulary distillation after any project model update
  • Use the confidence threshold to control auto-apply behaviour
  • Review the federated term graph for synonym clusters that may indicate vocabulary drift across projects