Skip to content

Vocabulary Provider Load

This diagram traces how a sister project's vocabulary (for example athenium or mnemosyne) is discovered, validated, and registered against FCC's canonical object model. The entry point is VocabularyMappingLoader.load(namespace) in src/fcc/objectmodel/vocabulary_loader.py, which is invoked at startup or on-demand when a federated query needs a namespace that has not yet been indexed. Developers read this trace to understand the VocabularyProviderPlugin contract introduced in v1.2.1 — the pattern by which sister repositories contribute their class mappings without creating runtime import cycles. The 175 packaged YAML mappings under src/fcc/data/objectmodel/ are the canonical source; plugins either wrap those or contribute their own.

The sequence below shows discovery of provider plugins, validation of each mapping, and the success and mismatch branches.

sequenceDiagram
    participant Caller
    participant VocabularyMappingLoader
    participant PluginRegistry
    participant Plugin as VocabularyProviderPlugin
    participant VocabularyMapping
    participant MappingStore
    participant Logger
    participant EventBus

    Caller->>VocabularyMappingLoader: load(namespace)
    VocabularyMappingLoader->>PluginRegistry: discover(VOCABULARY_PROVIDERS)
    PluginRegistry-->>VocabularyMappingLoader: list[VocabularyProviderPlugin]
    loop for each plugin
        VocabularyMappingLoader->>Plugin: get_namespace()
        Plugin-->>VocabularyMappingLoader: namespace_id
        alt namespace matches
            VocabularyMappingLoader->>Plugin: get_class_map()
            Plugin-->>VocabularyMappingLoader: dict[str, type]
            loop for each entry
                VocabularyMappingLoader->>VocabularyMapping: validate(entry)
                alt valid
                    VocabularyMapping-->>VocabularyMappingLoader: ok
                    VocabularyMappingLoader->>MappingStore: add(mapping)
                else missing source_id
                    VocabularyMappingLoader->>Logger: info(missing_source_ids)
                    Note over EventBus: emits vocabulary.mismatch
                    VocabularyMappingLoader->>EventBus: publish(vocabulary.mismatch)
                end
            end
            Note over EventBus: emits vocabulary.loaded
            VocabularyMappingLoader->>EventBus: publish(vocabulary.loaded)
        end
    end
    VocabularyMappingLoader-->>Caller: dict[str, type]

Failure modes are designed to be soft. A plugin that raises from get_class_map() is caught by the loader, logged, and skipped — other plugins still load. Individual rows missing a source_id, target_id, or similarity_score below the configured floor are collected into a missing_source_ids report written at INFO level and emitted as vocabulary.mismatch events; callers typically aggregate these for a weekly data-quality digest. The loader never raises on partial failure, which is important because federation queries depend on best-effort availability of cross-project vocabularies. Instrumentation usually subscribes to both events: vocabulary.loaded for successful namespace registration metrics and vocabulary.mismatch for provenance alerts.

The returned dict[str, type] is the live class map used by ModelFacade implementations to construct DomainEntity instances for the namespace; it is keyed by the canonical FCC class name, not the source namespace's name, so that federated queries resolve uniformly across ecosystems. In practice the map is small (tens of entries per provider) and eagerly constructed.

Steps in detail

  1. Caller to VocabularyMappingLoader: load — The caller passes a target namespace identifier.
  2. Loader to PluginRegistry: discover — The registry returns every plugin registered under PluginType.VOCABULARY_PROVIDERS.
  3. Loader to Plugin: get_namespace (loop) — Each plugin is asked which namespace it contributes; only matches proceed.
  4. Loader to Plugin: get_class_map — Matching plugins return their full mapping as a dict of class-name to target type.
  5. Loader to VocabularyMapping: validate (loop) — Each entry is validated against the VocabularyMapping frozen dataclass contract.
  6. Loader to MappingStore: add — Valid entries are appended to the in-memory mapping store under the namespace key.
  7. Loader to Logger: info(missing_source_ids) — Entries that fail validation are logged at info level with a structured missing_source_ids payload.
  8. Loader to EventBus: publish(vocabulary.mismatch) — A mismatch event fires per failed row so subscribers can aggregate or alert.
  9. Loader to EventBus: publish(vocabulary.loaded) — Once a plugin's map has been fully processed, a single vocabulary.loaded event summarises the namespace.
  10. Loader to Caller: dict[str, type] — The merged class map for the requested namespace is returned.

See also

  • Entry point: src/fcc/objectmodel/vocabulary_loader.py
  • Plugin contract: src/fcc/plugins/base.py (VocabularyProviderPlugin)
  • Related class diagram: ../class-diagrams/object-model.md
  • Related event types: src/fcc/messaging/events.pyEventType.VOCABULARY_LOADED, EventType.VOCABULARY_MISMATCH