Skip to content

Chapter 4: Federated Knowledge

Learning Objectives

By the end of this chapter you will be able to:

  1. Explain why knowledge federation is necessary for multi-project ecosystems.
  2. Describe the namespace IRI scheme and how it prevents identifier collisions.
  3. Use the NamespaceRegistry to register and resolve project namespaces.
  4. Federate knowledge graphs across FCC, AOME, CONSTEL, and other projects.
  5. Query federated knowledge graphs using cross-project identifiers.

The figure below shows how per-project namespaces and per-project knowledge graphs are stitched together by an EntityResolver into a single FederatedKnowledgeGraph that supports cross-project queries.

flowchart TB
    subgraph NS["Namespace Registry"]
        FCC_NS["fcc: https://fcc.example.org/"]
        AOME_NS["aome: https://aome.example.org/"]
        CONSTEL_NS["constel: https://constel.example.org/"]
    end

    subgraph Graphs["Project Knowledge Graphs"]
        FCC_KG[(FCC KG)]
        AOME_KG[(AOME KG)]
        CONSTEL_KG[(CONSTEL KG)]
    end

    ER[EntityResolver] --> FCC_KG
    ER --> AOME_KG
    ER --> CONSTEL_KG

    FCC_KG --> FED[(FederatedKnowledgeGraph)]
    AOME_KG --> FED
    CONSTEL_KG --> FED

    FED --> QUERY[Cross-Project Queries]
    FED --> XEDGE[Cross-Namespace Edges]

    style FED fill:#2196F3,color:#fff
    style QUERY fill:#4CAF50,color:#fff

Because namespaces are resolved at query time rather than ingest time, new projects can join the federation without rematerialising existing knowledge graphs.

The Federation Problem

In a single-project FCC deployment, the knowledge graph (Chapter 2) and search index (Chapter 1) are self-contained. All entities use the same namespace, all identifiers are unique, and all queries are local.

In a multi-project ecosystem, this breaks down. AOME has its own knowledge graph with privacy-related entities. CONSTEL has its own metadata graph with cross-project relationships. CTO has its own object model with canonical entity definitions. When a query spans multiple projects ("find all artifacts from FCC sessions that reference AOME-classified personal data"), these knowledge graphs need to be queryable as a single federated graph.

Federation solves this by establishing shared conventions for naming, referencing, and querying entities across project boundaries, without requiring all projects to use the same storage or schema.

Namespace Design

The foundation of federation is the namespace IRI scheme (see ADR-005: Federated KG Namespace). Each project in the ecosystem owns a unique namespace IRI:

Project Namespace IRI
FCC https://fcc.example.org/ontology/
AOME https://aome.example.org/ontology/
CONSTEL https://constel.example.org/ontology/
CTO https://cto.example.org/ontology/
Sky-Parlour https://sky-parlour.example.org/ontology/

Every entity in a project's knowledge graph is prefixed with that project's namespace:

fcc:research_analyst     -- a persona in FCC
aome:privacy_classifier  -- a classifier in AOME
constel:metadata_index   -- a metadata index in CONSTEL

Why IRIs?

IRIs (Internationalized Resource Identifiers) are the standard naming convention for RDF and OWL. They provide:

  1. Global uniqueness. No two projects can accidentally use the same identifier.
  2. Dereferenceable. In a web-enabled deployment, the IRI can resolve to the entity's description.
  3. Standardized. Tools and libraries across the RDF ecosystem understand IRIs natively.

The NamespaceRegistry

The NamespaceRegistry manages namespace registration, prefix resolution, and cross-project identifier mapping:

from fcc.knowledge.federation import NamespaceRegistry

registry = NamespaceRegistry()

# Register project namespaces
registry.register("fcc", "https://fcc.example.org/ontology/")
registry.register("aome", "https://aome.example.org/ontology/")
registry.register("constel", "https://constel.example.org/ontology/")

# Resolve a prefixed identifier to a full IRI
full_iri = registry.resolve("fcc:research_analyst")
# "https://fcc.example.org/ontology/research_analyst"

# Compact a full IRI to a prefixed identifier
prefix = registry.compact("https://aome.example.org/ontology/privacy_classifier")
# "aome:privacy_classifier"

The registry is shared across all federation operations. It is typically initialized at startup from a configuration file:

# config/namespaces.yaml
namespaces:
  fcc: "https://fcc.example.org/ontology/"
  aome: "https://aome.example.org/ontology/"
  constel: "https://constel.example.org/ontology/"
  cto: "https://cto.example.org/ontology/"
  sky_parlour: "https://sky-parlour.example.org/ontology/"

Federating Knowledge Graphs

Federation operates at three levels:

Level 1: Cross-Project References

The simplest form of federation. One project's knowledge graph references entities from another project using their full IRI:

fcc:artifact_001 fcc:classifiedBy aome:privacy_classifier .
fcc:session_42 constel:indexedIn constel:metadata_index .

These cross-references are created during simulation when FCC interacts with other projects via protocol integration (Chapter 5). No special federation infrastructure is required -- just consistent namespace usage.

Level 2: Federated Query

A federated query spans multiple knowledge graphs. The query planner routes sub-queries to the appropriate project's graph and combines the results:

from fcc.knowledge.federation import FederatedQuery

query = FederatedQuery(registry=namespace_registry)

results = query.execute("""
    SELECT ?artifact ?persona ?classification
    WHERE {
        ?artifact fcc:createdBy ?persona .
        ?artifact aome:classifiedAs ?classification .
        FILTER (?classification = aome:PersonalData)
    }
""")

The query planner:

  1. Parses the query to identify which namespaces are referenced.
  2. Routes the fcc: portions to FCC's knowledge graph.
  3. Routes the aome: portions to AOME's knowledge graph.
  4. Joins the results on shared identifiers.

Level 3: Merged Graph

For complex analytics, merge multiple knowledge graphs into a single unified graph:

from fcc.knowledge.federation import GraphMerger

merger = GraphMerger(registry=namespace_registry)

merged = merger.merge([
    fcc_ontology,
    aome_ontology,
    constel_ontology,
])

# Query the merged graph as a single entity
results = merged.query(subject_type="Artifact", predicate="classifiedAs")

Merging preserves all namespace prefixes, so entities from different projects remain distinguishable. Conflicts (two projects defining the same relationship with different semantics) are detected and reported.

Shared Vocabulary

Federation works best when projects share vocabulary for common concepts. The FCC ecosystem defines shared vocabulary terms in a dedicated namespace:

shared: https://ecosystem.example.org/shared/

Shared terms include:

  • shared:createdAt -- ISO 8601 timestamp
  • shared:version -- semantic version string
  • shared:status -- lifecycle status (draft, active, archived)
  • shared:owner -- the project that owns the entity
  • shared:license -- the license under which the entity is published

Using shared vocabulary for cross-cutting concepts ensures that federated queries can join on common fields without project-specific translation.

Federation Patterns

Pattern 1: Hub-and-Spoke

FCC acts as the hub. All projects publish their knowledge graphs to FCC, which merges and indexes them. Queries go through FCC.

Pros: Simple query routing, single point of coordination. Cons: FCC becomes a bottleneck, single point of failure.

Pattern 2: Peer-to-Peer

Each project maintains its own knowledge graph and responds to federated queries directly. A query planner routes sub-queries to the appropriate project.

Pros: No single point of failure, each project controls its own data. Cons: More complex query routing, potential consistency issues.

Pattern 3: CONSTEL as Mediator

CONSTEL acts as the federation mediator. Projects publish metadata summaries to CONSTEL, which maintains a global index of what exists where. Full queries are routed to the owning projects.

Pros: Lightweight metadata sharing, projects retain data ownership. Cons: Requires CONSTEL infrastructure, metadata may be stale.

The FCC ecosystem uses Pattern 3 as the default. CONSTEL's metadata indexing is already built for cross-project coordination, making it a natural choice for federation mediation.

Security and Privacy

Federated queries can inadvertently expose sensitive information across project boundaries. The federation layer includes access control:

query = FederatedQuery(
    registry=namespace_registry,
    access_policy={
        "aome": ["read_public"],  # Only access public AOME data
        "fcc": ["read_all"],      # Full access to FCC data
    },
)

AOME's privacy classifications are particularly important here. If an artifact is classified as containing personal data, federated queries from other projects may be restricted to metadata-only access (the artifact exists and has these properties, but you cannot read its content).

Key Takeaways

  • Federation enables cross-project knowledge graph queries without centralized storage.
  • Namespace IRIs prevent identifier collisions and enable global uniqueness.
  • The NamespaceRegistry manages registration, resolution, and compaction of namespaces.
  • Three federation levels: cross-project references, federated query, and merged graph.
  • The CONSTEL-mediated pattern (hub as metadata mediator) is the default for the FCC ecosystem.
  • Access control ensures sensitive data is not inadvertently exposed across project boundaries.

Cross-References


← Chapter 3: RAG Pipelines | Next: Chapter 5 -- Protocol Integration →