Chapter 17: Knowledge Federation¶

Knowledge federation extends the FCC framework with semantic search, knowledge graph construction, and retrieval-augmented generation (RAG). Rather than treating personas, workflows, and governance artifacts as isolated data structures, knowledge federation connects them through a unified semantic layer that supports cross-project discovery, vocabulary alignment, and context-aware retrieval.

This chapter covers four pillars of the knowledge federation subsystem:

Semantic Search — vector-based similarity search over personas, artifacts, and documents.
Knowledge Graphs — typed node-and-edge graphs representing FCC concepts and their relationships.
Serialization Formats — exporting knowledge graphs to Turtle, JSON-LD, and SKOS for interoperability.
RAG Pipelines — chunking, retrieval, and generation for persona-aware question answering.

The flowchart below shows how persona YAML, workflow JSON, and external documents flow through the knowledge-graph, search-index, and RAG pipelines to produce persona-aware answers.

flowchart LR
    subgraph Input["Source Data"]
        YAML[Persona YAML]
        WF[Workflow JSON]
        DOCS[Documents]
    end

    subgraph KG["Knowledge Graph"]
        BUILD[build_persona_graph]
        GRAPH[(KnowledgeGraph)]
        SER[Serializers]
    end

    subgraph Search["Semantic Search"]
        EMB[EmbeddingProvider]
        IDX[(SearchIndex)]
        PSI[PersonaSearchIndex]
    end

    subgraph RAG["RAG Pipeline"]
        CHUNK[DocumentChunker]
        RET[SemanticRetriever]
        GEN[RAGPipeline]
    end

    YAML --> BUILD --> GRAPH
    GRAPH --> SER
    SER -->|Turtle / JSON-LD / SKOS| EXT[External Tools]

    YAML --> EMB --> IDX
    IDX --> PSI
    DOCS --> CHUNK --> IDX
    IDX --> RET --> GEN
    GEN -->|Persona-aware answers| OUT[Output]

    style GRAPH fill:#2196F3,color:#fff
    style IDX fill:#4CAF50,color:#fff
    style GEN fill:#9C27B0,color:#fff

The three pipelines share data but not control flow, so each can be rebuilt or swapped independently — for example, replacing the embedding provider without rebuilding the graph.

Semantic Search Fundamentals¶

Semantic search replaces keyword matching with vector similarity. Instead of searching for exact terms like "governance" or "privacy", semantic search encodes the meaning of a query into a high-dimensional vector and finds documents whose vectors are closest in that space.

EmbeddingProvider Interface¶

The EmbeddingProvider protocol defines how text is converted to vectors:

from fcc.search import EmbeddingProvider, MockEmbeddingProvider

# The mock provider generates deterministic vectors for testing
provider = MockEmbeddingProvider(dimension=128)
vector = provider.embed("Responsible AI governance principles")

print(f"Dimension: {len(vector)}")  # 128
print(f"First 5 values: {vector[:5]}")

Any embedding backend (OpenAI, Sentence Transformers, local models) can implement EmbeddingProvider. The MockEmbeddingProvider is used throughout tests and notebooks to avoid API key dependencies.

SearchIndex¶

The SearchIndex stores documents and their vectors for efficient retrieval:

Method	Description
`add(doc_id, text, metadata)`	Embed and store a document
`search(query, top_k)`	Find the top-k most similar documents
`save(path)`	Persist the index to disk
`load(path)`	Restore a previously saved index
`remove(doc_id)`	Remove a document from the index
`stats()`	Return index statistics (doc count, dimension)

from fcc.search import SearchIndex, MockEmbeddingProvider

provider = MockEmbeddingProvider(dimension=128)
index = SearchIndex(provider)

# Add documents
index.add("doc-1", "FCC Find phase explores the problem space", {"phase": "Find"})
index.add("doc-2", "FCC Create phase synthesizes solutions", {"phase": "Create"})
index.add("doc-3", "FCC Critique phase evaluates deliverables", {"phase": "Critique"})

# Search
results = index.search("exploring problems and research", top_k=2)
for result in results:
    print(f"  {result.doc_id}: {result.score:.3f} - {result.text[:50]}")

PersonaSearchIndex¶

The PersonaSearchIndex wraps SearchIndex to provide persona-specific search capabilities. It automatically indexes persona roles, responsibilities, and skills:

from fcc.search import PersonaSearchIndex, MockEmbeddingProvider
from fcc.personas.registry import PersonaRegistry

provider = MockEmbeddingProvider(dimension=128)
registry = PersonaRegistry.from_data_dir()
persona_index = PersonaSearchIndex(provider, registry)

# Search for governance-related personas
results = persona_index.search("governance compliance audit", top_k=5)
for r in results:
    print(f"  {r.persona_id}: {r.name} (score: {r.score:.3f})")

Knowledge Graph Construction¶

A knowledge graph represents FCC concepts as typed nodes connected by typed edges. This structure enables graph queries, path finding, and semantic reasoning over the entire framework.

Core Types¶

Type	Description
`KnowledgeNode`	A node with `id`, `label`, `node_type`, and `properties`
`KnowledgeEdge`	A directed edge with `source`, `target`, `edge_type`, and `weight`
`KnowledgeGraph`	Container for nodes and edges with query methods
`NodeType`	Enum: `PERSONA`, `WORKFLOW`, `ARTIFACT`, `CONCEPT`, `PROJECT`
`EdgeType`	Enum: `COLLABORATES`, `PRODUCES`, `CONSUMES`, `BELONGS_TO`, `RELATED`

Building a Graph Manually¶

from fcc.knowledge import (
    KnowledgeGraph, KnowledgeNode, KnowledgeEdge,
    NodeType, EdgeType,
)

graph = KnowledgeGraph()

# Add persona nodes
graph.add_node(KnowledgeNode(
    id="persona:RC",
    label="Research Curator",
    node_type=NodeType.PERSONA,
    properties={"phase": "Find", "category": "core"},
))
graph.add_node(KnowledgeNode(
    id="persona:SA",
    label="Solution Architect",
    node_type=NodeType.PERSONA,
    properties={"phase": "Create", "category": "core"},
))

# Add collaboration edge
graph.add_edge(KnowledgeEdge(
    source="persona:RC",
    target="persona:SA",
    edge_type=EdgeType.COLLABORATES,
    weight=0.9,
))

print(graph.stats())
# GraphStats(nodes=2, edges=1, node_types={'PERSONA': 2})

Building from the PersonaRegistry¶

The build_persona_graph function automatically constructs a knowledge graph from a PersonaRegistry, creating nodes for each persona and edges for collaboration links, category membership, and phase groupings:

from fcc.knowledge import build_persona_graph
from fcc.personas.registry import PersonaRegistry

registry = PersonaRegistry.from_data_dir()
graph = build_persona_graph(registry)

print(f"Nodes: {graph.node_count}")
print(f"Edges: {graph.edge_count}")

# Query by type
personas = graph.nodes_by_type(NodeType.PERSONA)
print(f"Persona nodes: {len(personas)}")

Graph Queries¶

The KnowledgeGraph supports several query patterns:

Query	Method	Description
By node type	`nodes_by_type(NodeType)`	All nodes of a given type
By edge type	`edges_by_type(EdgeType)`	All edges of a given type
Neighbors	`neighbors(node_id)`	Directly connected nodes
Subgraph	`subgraph(node_ids)`	Extract a subgraph
Path	`shortest_path(src, dst)`	Find shortest path between nodes
Merge	`merge(other_graph)`	Combine two graphs

Serialization Formats¶

Knowledge graphs need to be exported for use by external tools, triple stores, and ontology editors. FCC supports three standard formats:

Turtle (TTL)¶

Turtle is the most common RDF serialization format. Each statement is a subject-predicate-object triple:

from fcc.knowledge import serialize_turtle

ttl_output = serialize_turtle(graph, base_uri="https://fcc.example.org/")
print(ttl_output)

Output:

@prefix fcc: <https://fcc.example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

fcc:persona-RC a fcc:Persona ;
    rdfs:label "Research Curator" ;
    fcc:phase "Find" ;
    fcc:collaboratesWith fcc:persona-SA .

JSON-LD¶

JSON-LD is the JSON-based RDF format, well-suited for web APIs and JavaScript consumers:

from fcc.knowledge import serialize_jsonld

jsonld_output = serialize_jsonld(graph, context_uri="https://fcc.example.org/context")
print(jsonld_output)

The output includes a @context block that maps short property names to full IRIs, making the JSON human-readable while remaining RDF-compatible.

SKOS (Simple Knowledge Organization System)¶

SKOS is used for taxonomies and classification schemes. FCC exports persona categories and phases as SKOS concept schemes:

from fcc.knowledge import serialize_skos

skos_output = serialize_skos(graph, scheme_uri="https://fcc.example.org/personas")
print(skos_output)

SKOS export produces:

SKOS Element	FCC Mapping
`skos:ConceptScheme`	Persona category or phase grouping
`skos:Concept`	Individual persona
`skos:broader`	Category membership
`skos:related`	Collaboration links
`skos:prefLabel`	Persona name

RAG Pipeline Architecture¶

Retrieval-Augmented Generation (RAG) combines search with LLM generation. Instead of asking an LLM to answer from its training data alone, RAG retrieves relevant context from a knowledge base and includes it in the prompt.

Pipeline Stages¶

The FCC RAG pipeline has four stages:

Documents → Chunker → SearchIndex → Retriever → Generator

Stage	Module	Responsibility
Chunking	`DocumentChunker`	Split documents into searchable chunks
Indexing	`SearchIndex`	Embed and store chunks for similarity search
Retrieval	`SemanticRetriever`	Find relevant chunks for a given query
Generation	`RAGPipeline`	Compose a prompt with retrieved context and generate an answer

Chunking Strategies¶

The DocumentChunker supports multiple strategies depending on the content type:

Strategy	Best For	How It Works
`FIXED_SIZE`	Plain text	Split into fixed-character windows with overlap
`PARAGRAPH`	Markdown, prose	Split on blank lines
`CODE`	Python, YAML	Split on function/class boundaries
`SEMANTIC`	Mixed content	Split on topic boundaries using embeddings

from fcc.rag import DocumentChunker, ChunkingStrategy

chunker = DocumentChunker()

# Chunk Python code
code_chunks = chunker.chunk(
    python_source,
    strategy=ChunkingStrategy.CODE,
    metadata={"source": "workflow.py"},
)

# Chunk markdown documentation
doc_chunks = chunker.chunk(
    markdown_text,
    strategy=ChunkingStrategy.PARAGRAPH,
    metadata={"source": "guidebook-ch17.md"},
)

SemanticRetriever¶

The SemanticRetriever wraps a SearchIndex to provide retrieval with filtering and re-ranking:

from fcc.rag import SemanticRetriever

retriever = SemanticRetriever(index, top_k=5, min_score=0.3)
chunks = retriever.retrieve("How does FCC handle governance?")

for chunk in chunks:
    print(f"  [{chunk.score:.2f}] {chunk.text[:80]}...")

RAGPipeline¶

The RAGPipeline orchestrates the full retrieve-then-generate workflow:

from fcc.rag import RAGPipeline

pipeline = RAGPipeline(
    retriever=retriever,
    persona_context="Research Curator",  # optional persona framing
)

answer = pipeline.query("What are the key governance constraints?")
print(answer.text)
print(f"Sources: {[s.doc_id for s in answer.sources]}")

When a persona_context is provided, the pipeline frames the generation prompt using that persona's R.I.S.C.E.A.R. specification, producing answers that reflect the persona's style, constraints, and expertise.

Federated Knowledge Graph¶

In a multi-project ecosystem, each project may maintain its own knowledge graph. The FederatedKnowledgeGraph provides a unified view across projects without requiring data centralization.

NamespaceRegistry¶

Each project registers a namespace to prevent ID collisions:

from fcc.federation import NamespaceRegistry

ns = NamespaceRegistry()
ns.register("fcc", "https://fcc.example.org/")
ns.register("paom", "https://paom.example.org/")
ns.register("constel", "https://constel.example.org/")

# Resolve qualified names
full_uri = ns.resolve("fcc:persona-RC")
# "https://fcc.example.org/persona-RC"

EntityResolver¶

The EntityResolver maps entities across namespaces using vocabulary mappings:

Method	Description
`resolve(entity_id, target_ns)`	Find the equivalent entity in another namespace
`add_mapping(source, target, confidence)`	Register a cross-namespace mapping
`mappings_for(entity_id)`	List all known mappings for an entity

from fcc.federation import EntityResolver

resolver = EntityResolver(namespace_registry=ns)
resolver.add_mapping("fcc:persona-RC", "paom:agent-research", confidence=0.95)

# Resolve across namespaces
paom_id = resolver.resolve("fcc:persona-RC", target_ns="paom")
print(paom_id)  # "paom:agent-research"

FederatedKnowledgeGraph¶

The FederatedKnowledgeGraph merges multiple project graphs while preserving namespace boundaries:

from fcc.federation import FederatedKnowledgeGraph

federated = FederatedKnowledgeGraph(
    namespace_registry=ns,
    entity_resolver=resolver,
)

federated.add_graph("fcc", fcc_graph)
federated.add_graph("paom", paom_graph)

# Query across all namespaces
all_personas = federated.query_nodes(node_type=NodeType.PERSONA)
print(f"Total personas across projects: {len(all_personas)}")

# Find cross-project relationships
cross_edges = federated.cross_namespace_edges()
print(f"Cross-project edges: {len(cross_edges)}")

Conflict Resolution¶

When merging graphs from different projects, conflicts can arise:

Conflict Type	Resolution Strategy
Duplicate node IDs	Namespace-qualified IDs prevent collisions
Contradictory properties	Latest-write-wins or confidence-based selection
Missing mappings	`EntityResolver` logs unresolved references
Schema mismatch	`VocabularyMapping` normalizes property names

Practical Exercises¶

Exercise 1: Build a Semantic Search Index¶

Using Notebook 15, create a PersonaSearchIndex and search for personas by capability description rather than exact ID lookup.

Exercise 2: Construct a Knowledge Graph¶

Using Notebook 16, build a knowledge graph from the full persona registry and export it to Turtle format. Verify that collaboration edges are correctly represented.

Exercise 3: RAG over FCC Documentation¶

Using Notebook 17, chunk the FCC guidebook chapters, build a retrieval index, and query it with persona-framed questions. Compare answers generated with and without persona context.

Exercise 4: Federate Two Project Graphs¶

Build separate knowledge graphs for FCC and a mock PAOM project. Use FederatedKnowledgeGraph to merge them and query for cross-project personas. Verify that entity resolution correctly maps equivalent concepts.

Summary¶

Knowledge federation transforms FCC from a standalone framework into a semantic hub that connects personas, workflows, and governance artifacts across projects. The four pillars — semantic search, knowledge graphs, serialization, and RAG — provide increasing levels of intelligence:

Pillar	Capability	Use Case
Semantic Search	Vector similarity	"Find personas related to privacy"
Knowledge Graphs	Typed relationships	"Show all collaborators of the Research Curator"
Serialization	Interoperability	"Export persona taxonomy to a triple store"
RAG Pipeline	Context-aware generation	"Answer governance questions using FCC docs"

The federated layer adds cross-project capabilities through namespace management, entity resolution, and graph merging.

Next Steps

Explore Notebook 15 for hands-on semantic search

Explore Notebook 16 for knowledge graph construction

Explore Notebook 17 for RAG pipeline development

Read Chapter 18 for documentation intelligence using AST analysis