Skip to content

Chapter 17: Knowledge Federation

Knowledge federation extends the FCC framework with semantic search, knowledge graph construction, and retrieval-augmented generation (RAG). Rather than treating personas, workflows, and governance artifacts as isolated data structures, knowledge federation connects them through a unified semantic layer that supports cross-project discovery, vocabulary alignment, and context-aware retrieval.

This chapter covers four pillars of the knowledge federation subsystem:

  1. Semantic Search — vector-based similarity search over personas, artifacts, and documents.
  2. Knowledge Graphs — typed node-and-edge graphs representing FCC concepts and their relationships.
  3. Serialization Formats — exporting knowledge graphs to Turtle, JSON-LD, and SKOS for interoperability.
  4. RAG Pipelines — chunking, retrieval, and generation for persona-aware question answering.

The flowchart below shows how persona YAML, workflow JSON, and external documents flow through the knowledge-graph, search-index, and RAG pipelines to produce persona-aware answers.

flowchart LR
    subgraph Input["Source Data"]
        YAML[Persona YAML]
        WF[Workflow JSON]
        DOCS[Documents]
    end

    subgraph KG["Knowledge Graph"]
        BUILD[build_persona_graph]
        GRAPH[(KnowledgeGraph)]
        SER[Serializers]
    end

    subgraph Search["Semantic Search"]
        EMB[EmbeddingProvider]
        IDX[(SearchIndex)]
        PSI[PersonaSearchIndex]
    end

    subgraph RAG["RAG Pipeline"]
        CHUNK[DocumentChunker]
        RET[SemanticRetriever]
        GEN[RAGPipeline]
    end

    YAML --> BUILD --> GRAPH
    GRAPH --> SER
    SER -->|Turtle / JSON-LD / SKOS| EXT[External Tools]

    YAML --> EMB --> IDX
    IDX --> PSI
    DOCS --> CHUNK --> IDX
    IDX --> RET --> GEN
    GEN -->|Persona-aware answers| OUT[Output]

    style GRAPH fill:#2196F3,color:#fff
    style IDX fill:#4CAF50,color:#fff
    style GEN fill:#9C27B0,color:#fff

The three pipelines share data but not control flow, so each can be rebuilt or swapped independently — for example, replacing the embedding provider without rebuilding the graph.

Semantic Search Fundamentals

Semantic search replaces keyword matching with vector similarity. Instead of searching for exact terms like "governance" or "privacy", semantic search encodes the meaning of a query into a high-dimensional vector and finds documents whose vectors are closest in that space.

EmbeddingProvider Interface

The EmbeddingProvider protocol defines how text is converted to vectors:

from fcc.search import EmbeddingProvider, MockEmbeddingProvider

# The mock provider generates deterministic vectors for testing
provider = MockEmbeddingProvider(dimension=128)
vector = provider.embed("Responsible AI governance principles")

print(f"Dimension: {len(vector)}")  # 128
print(f"First 5 values: {vector[:5]}")

Any embedding backend (OpenAI, Sentence Transformers, local models) can implement EmbeddingProvider. The MockEmbeddingProvider is used throughout tests and notebooks to avoid API key dependencies.

SearchIndex

The SearchIndex stores documents and their vectors for efficient retrieval:

Method Description
add(doc_id, text, metadata) Embed and store a document
search(query, top_k) Find the top-k most similar documents
save(path) Persist the index to disk
load(path) Restore a previously saved index
remove(doc_id) Remove a document from the index
stats() Return index statistics (doc count, dimension)
from fcc.search import SearchIndex, MockEmbeddingProvider

provider = MockEmbeddingProvider(dimension=128)
index = SearchIndex(provider)

# Add documents
index.add("doc-1", "FCC Find phase explores the problem space", {"phase": "Find"})
index.add("doc-2", "FCC Create phase synthesizes solutions", {"phase": "Create"})
index.add("doc-3", "FCC Critique phase evaluates deliverables", {"phase": "Critique"})

# Search
results = index.search("exploring problems and research", top_k=2)
for result in results:
    print(f"  {result.doc_id}: {result.score:.3f} - {result.text[:50]}")

PersonaSearchIndex

The PersonaSearchIndex wraps SearchIndex to provide persona-specific search capabilities. It automatically indexes persona roles, responsibilities, and skills:

from fcc.search import PersonaSearchIndex, MockEmbeddingProvider
from fcc.personas.registry import PersonaRegistry

provider = MockEmbeddingProvider(dimension=128)
registry = PersonaRegistry.from_data_dir()
persona_index = PersonaSearchIndex(provider, registry)

# Search for governance-related personas
results = persona_index.search("governance compliance audit", top_k=5)
for r in results:
    print(f"  {r.persona_id}: {r.name} (score: {r.score:.3f})")

Knowledge Graph Construction

A knowledge graph represents FCC concepts as typed nodes connected by typed edges. This structure enables graph queries, path finding, and semantic reasoning over the entire framework.

Core Types

Type Description
KnowledgeNode A node with id, label, node_type, and properties
KnowledgeEdge A directed edge with source, target, edge_type, and weight
KnowledgeGraph Container for nodes and edges with query methods
NodeType Enum: PERSONA, WORKFLOW, ARTIFACT, CONCEPT, PROJECT
EdgeType Enum: COLLABORATES, PRODUCES, CONSUMES, BELONGS_TO, RELATED

Building a Graph Manually

from fcc.knowledge import (
    KnowledgeGraph, KnowledgeNode, KnowledgeEdge,
    NodeType, EdgeType,
)

graph = KnowledgeGraph()

# Add persona nodes
graph.add_node(KnowledgeNode(
    id="persona:RC",
    label="Research Curator",
    node_type=NodeType.PERSONA,
    properties={"phase": "Find", "category": "core"},
))
graph.add_node(KnowledgeNode(
    id="persona:SA",
    label="Solution Architect",
    node_type=NodeType.PERSONA,
    properties={"phase": "Create", "category": "core"},
))

# Add collaboration edge
graph.add_edge(KnowledgeEdge(
    source="persona:RC",
    target="persona:SA",
    edge_type=EdgeType.COLLABORATES,
    weight=0.9,
))

print(graph.stats())
# GraphStats(nodes=2, edges=1, node_types={'PERSONA': 2})

Building from the PersonaRegistry

The build_persona_graph function automatically constructs a knowledge graph from a PersonaRegistry, creating nodes for each persona and edges for collaboration links, category membership, and phase groupings:

from fcc.knowledge import build_persona_graph
from fcc.personas.registry import PersonaRegistry

registry = PersonaRegistry.from_data_dir()
graph = build_persona_graph(registry)

print(f"Nodes: {graph.node_count}")
print(f"Edges: {graph.edge_count}")

# Query by type
personas = graph.nodes_by_type(NodeType.PERSONA)
print(f"Persona nodes: {len(personas)}")

Graph Queries

The KnowledgeGraph supports several query patterns:

Query Method Description
By node type nodes_by_type(NodeType) All nodes of a given type
By edge type edges_by_type(EdgeType) All edges of a given type
Neighbors neighbors(node_id) Directly connected nodes
Subgraph subgraph(node_ids) Extract a subgraph
Path shortest_path(src, dst) Find shortest path between nodes
Merge merge(other_graph) Combine two graphs

Serialization Formats

Knowledge graphs need to be exported for use by external tools, triple stores, and ontology editors. FCC supports three standard formats:

Turtle (TTL)

Turtle is the most common RDF serialization format. Each statement is a subject-predicate-object triple:

from fcc.knowledge import serialize_turtle

ttl_output = serialize_turtle(graph, base_uri="https://fcc.example.org/")
print(ttl_output)

Output:

@prefix fcc: <https://fcc.example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

fcc:persona-RC a fcc:Persona ;
    rdfs:label "Research Curator" ;
    fcc:phase "Find" ;
    fcc:collaboratesWith fcc:persona-SA .

JSON-LD

JSON-LD is the JSON-based RDF format, well-suited for web APIs and JavaScript consumers:

from fcc.knowledge import serialize_jsonld

jsonld_output = serialize_jsonld(graph, context_uri="https://fcc.example.org/context")
print(jsonld_output)

The output includes a @context block that maps short property names to full IRIs, making the JSON human-readable while remaining RDF-compatible.

SKOS (Simple Knowledge Organization System)

SKOS is used for taxonomies and classification schemes. FCC exports persona categories and phases as SKOS concept schemes:

from fcc.knowledge import serialize_skos

skos_output = serialize_skos(graph, scheme_uri="https://fcc.example.org/personas")
print(skos_output)

SKOS export produces:

SKOS Element FCC Mapping
skos:ConceptScheme Persona category or phase grouping
skos:Concept Individual persona
skos:broader Category membership
skos:related Collaboration links
skos:prefLabel Persona name

RAG Pipeline Architecture

Retrieval-Augmented Generation (RAG) combines search with LLM generation. Instead of asking an LLM to answer from its training data alone, RAG retrieves relevant context from a knowledge base and includes it in the prompt.

Pipeline Stages

The FCC RAG pipeline has four stages:

Documents → Chunker → SearchIndex → Retriever → Generator
Stage Module Responsibility
Chunking DocumentChunker Split documents into searchable chunks
Indexing SearchIndex Embed and store chunks for similarity search
Retrieval SemanticRetriever Find relevant chunks for a given query
Generation RAGPipeline Compose a prompt with retrieved context and generate an answer

Chunking Strategies

The DocumentChunker supports multiple strategies depending on the content type:

Strategy Best For How It Works
FIXED_SIZE Plain text Split into fixed-character windows with overlap
PARAGRAPH Markdown, prose Split on blank lines
CODE Python, YAML Split on function/class boundaries
SEMANTIC Mixed content Split on topic boundaries using embeddings
from fcc.rag import DocumentChunker, ChunkingStrategy

chunker = DocumentChunker()

# Chunk Python code
code_chunks = chunker.chunk(
    python_source,
    strategy=ChunkingStrategy.CODE,
    metadata={"source": "workflow.py"},
)

# Chunk markdown documentation
doc_chunks = chunker.chunk(
    markdown_text,
    strategy=ChunkingStrategy.PARAGRAPH,
    metadata={"source": "guidebook-ch17.md"},
)

SemanticRetriever

The SemanticRetriever wraps a SearchIndex to provide retrieval with filtering and re-ranking:

from fcc.rag import SemanticRetriever

retriever = SemanticRetriever(index, top_k=5, min_score=0.3)
chunks = retriever.retrieve("How does FCC handle governance?")

for chunk in chunks:
    print(f"  [{chunk.score:.2f}] {chunk.text[:80]}...")

RAGPipeline

The RAGPipeline orchestrates the full retrieve-then-generate workflow:

from fcc.rag import RAGPipeline

pipeline = RAGPipeline(
    retriever=retriever,
    persona_context="Research Curator",  # optional persona framing
)

answer = pipeline.query("What are the key governance constraints?")
print(answer.text)
print(f"Sources: {[s.doc_id for s in answer.sources]}")

When a persona_context is provided, the pipeline frames the generation prompt using that persona's R.I.S.C.E.A.R. specification, producing answers that reflect the persona's style, constraints, and expertise.

Federated Knowledge Graph

In a multi-project ecosystem, each project may maintain its own knowledge graph. The FederatedKnowledgeGraph provides a unified view across projects without requiring data centralization.

NamespaceRegistry

Each project registers a namespace to prevent ID collisions:

from fcc.federation import NamespaceRegistry

ns = NamespaceRegistry()
ns.register("fcc", "https://fcc.example.org/")
ns.register("paom", "https://paom.example.org/")
ns.register("constel", "https://constel.example.org/")

# Resolve qualified names
full_uri = ns.resolve("fcc:persona-RC")
# "https://fcc.example.org/persona-RC"

EntityResolver

The EntityResolver maps entities across namespaces using vocabulary mappings:

Method Description
resolve(entity_id, target_ns) Find the equivalent entity in another namespace
add_mapping(source, target, confidence) Register a cross-namespace mapping
mappings_for(entity_id) List all known mappings for an entity
from fcc.federation import EntityResolver

resolver = EntityResolver(namespace_registry=ns)
resolver.add_mapping("fcc:persona-RC", "paom:agent-research", confidence=0.95)

# Resolve across namespaces
paom_id = resolver.resolve("fcc:persona-RC", target_ns="paom")
print(paom_id)  # "paom:agent-research"

FederatedKnowledgeGraph

The FederatedKnowledgeGraph merges multiple project graphs while preserving namespace boundaries:

from fcc.federation import FederatedKnowledgeGraph

federated = FederatedKnowledgeGraph(
    namespace_registry=ns,
    entity_resolver=resolver,
)

federated.add_graph("fcc", fcc_graph)
federated.add_graph("paom", paom_graph)

# Query across all namespaces
all_personas = federated.query_nodes(node_type=NodeType.PERSONA)
print(f"Total personas across projects: {len(all_personas)}")

# Find cross-project relationships
cross_edges = federated.cross_namespace_edges()
print(f"Cross-project edges: {len(cross_edges)}")

Conflict Resolution

When merging graphs from different projects, conflicts can arise:

Conflict Type Resolution Strategy
Duplicate node IDs Namespace-qualified IDs prevent collisions
Contradictory properties Latest-write-wins or confidence-based selection
Missing mappings EntityResolver logs unresolved references
Schema mismatch VocabularyMapping normalizes property names

Practical Exercises

Exercise 1: Build a Semantic Search Index

Using Notebook 15, create a PersonaSearchIndex and search for personas by capability description rather than exact ID lookup.

Exercise 2: Construct a Knowledge Graph

Using Notebook 16, build a knowledge graph from the full persona registry and export it to Turtle format. Verify that collaboration edges are correctly represented.

Exercise 3: RAG over FCC Documentation

Using Notebook 17, chunk the FCC guidebook chapters, build a retrieval index, and query it with persona-framed questions. Compare answers generated with and without persona context.

Exercise 4: Federate Two Project Graphs

Build separate knowledge graphs for FCC and a mock PAOM project. Use FederatedKnowledgeGraph to merge them and query for cross-project personas. Verify that entity resolution correctly maps equivalent concepts.

Summary

Knowledge federation transforms FCC from a standalone framework into a semantic hub that connects personas, workflows, and governance artifacts across projects. The four pillars — semantic search, knowledge graphs, serialization, and RAG — provide increasing levels of intelligence:

Pillar Capability Use Case
Semantic Search Vector similarity "Find personas related to privacy"
Knowledge Graphs Typed relationships "Show all collaborators of the Research Curator"
Serialization Interoperability "Export persona taxonomy to a triple store"
RAG Pipeline Context-aware generation "Answer governance questions using FCC docs"

The federated layer adds cross-project capabilities through namespace management, entity resolution, and graph merging.

Next Steps

  • Explore Notebook 15 for hands-on semantic search
  • Explore Notebook 16 for knowledge graph construction
  • Explore Notebook 17 for RAG pipeline development
  • Read Chapter 18 for documentation intelligence using AST analysis