Chapter 17: Knowledge Federation¶
Knowledge federation extends the FCC framework with semantic search, knowledge graph construction, and retrieval-augmented generation (RAG). Rather than treating personas, workflows, and governance artifacts as isolated data structures, knowledge federation connects them through a unified semantic layer that supports cross-project discovery, vocabulary alignment, and context-aware retrieval.
This chapter covers four pillars of the knowledge federation subsystem:
- Semantic Search — vector-based similarity search over personas, artifacts, and documents.
- Knowledge Graphs — typed node-and-edge graphs representing FCC concepts and their relationships.
- Serialization Formats — exporting knowledge graphs to Turtle, JSON-LD, and SKOS for interoperability.
- RAG Pipelines — chunking, retrieval, and generation for persona-aware question answering.
The flowchart below shows how persona YAML, workflow JSON, and external documents flow through the knowledge-graph, search-index, and RAG pipelines to produce persona-aware answers.
flowchart LR
subgraph Input["Source Data"]
YAML[Persona YAML]
WF[Workflow JSON]
DOCS[Documents]
end
subgraph KG["Knowledge Graph"]
BUILD[build_persona_graph]
GRAPH[(KnowledgeGraph)]
SER[Serializers]
end
subgraph Search["Semantic Search"]
EMB[EmbeddingProvider]
IDX[(SearchIndex)]
PSI[PersonaSearchIndex]
end
subgraph RAG["RAG Pipeline"]
CHUNK[DocumentChunker]
RET[SemanticRetriever]
GEN[RAGPipeline]
end
YAML --> BUILD --> GRAPH
GRAPH --> SER
SER -->|Turtle / JSON-LD / SKOS| EXT[External Tools]
YAML --> EMB --> IDX
IDX --> PSI
DOCS --> CHUNK --> IDX
IDX --> RET --> GEN
GEN -->|Persona-aware answers| OUT[Output]
style GRAPH fill:#2196F3,color:#fff
style IDX fill:#4CAF50,color:#fff
style GEN fill:#9C27B0,color:#fff
The three pipelines share data but not control flow, so each can be rebuilt or swapped independently — for example, replacing the embedding provider without rebuilding the graph.
Semantic Search Fundamentals¶
Semantic search replaces keyword matching with vector similarity. Instead of searching for exact terms like "governance" or "privacy", semantic search encodes the meaning of a query into a high-dimensional vector and finds documents whose vectors are closest in that space.
EmbeddingProvider Interface¶
The EmbeddingProvider protocol defines how text is converted to vectors:
from fcc.search import EmbeddingProvider, MockEmbeddingProvider
# The mock provider generates deterministic vectors for testing
provider = MockEmbeddingProvider(dimension=128)
vector = provider.embed("Responsible AI governance principles")
print(f"Dimension: {len(vector)}") # 128
print(f"First 5 values: {vector[:5]}")
Any embedding backend (OpenAI, Sentence Transformers, local models) can
implement EmbeddingProvider. The MockEmbeddingProvider is used
throughout tests and notebooks to avoid API key dependencies.
SearchIndex¶
The SearchIndex stores documents and their vectors for efficient
retrieval:
| Method | Description |
|---|---|
add(doc_id, text, metadata) |
Embed and store a document |
search(query, top_k) |
Find the top-k most similar documents |
save(path) |
Persist the index to disk |
load(path) |
Restore a previously saved index |
remove(doc_id) |
Remove a document from the index |
stats() |
Return index statistics (doc count, dimension) |
from fcc.search import SearchIndex, MockEmbeddingProvider
provider = MockEmbeddingProvider(dimension=128)
index = SearchIndex(provider)
# Add documents
index.add("doc-1", "FCC Find phase explores the problem space", {"phase": "Find"})
index.add("doc-2", "FCC Create phase synthesizes solutions", {"phase": "Create"})
index.add("doc-3", "FCC Critique phase evaluates deliverables", {"phase": "Critique"})
# Search
results = index.search("exploring problems and research", top_k=2)
for result in results:
print(f" {result.doc_id}: {result.score:.3f} - {result.text[:50]}")
PersonaSearchIndex¶
The PersonaSearchIndex wraps SearchIndex to provide persona-specific
search capabilities. It automatically indexes persona roles,
responsibilities, and skills:
from fcc.search import PersonaSearchIndex, MockEmbeddingProvider
from fcc.personas.registry import PersonaRegistry
provider = MockEmbeddingProvider(dimension=128)
registry = PersonaRegistry.from_data_dir()
persona_index = PersonaSearchIndex(provider, registry)
# Search for governance-related personas
results = persona_index.search("governance compliance audit", top_k=5)
for r in results:
print(f" {r.persona_id}: {r.name} (score: {r.score:.3f})")
Knowledge Graph Construction¶
A knowledge graph represents FCC concepts as typed nodes connected by typed edges. This structure enables graph queries, path finding, and semantic reasoning over the entire framework.
Core Types¶
| Type | Description |
|---|---|
KnowledgeNode |
A node with id, label, node_type, and properties |
KnowledgeEdge |
A directed edge with source, target, edge_type, and weight |
KnowledgeGraph |
Container for nodes and edges with query methods |
NodeType |
Enum: PERSONA, WORKFLOW, ARTIFACT, CONCEPT, PROJECT |
EdgeType |
Enum: COLLABORATES, PRODUCES, CONSUMES, BELONGS_TO, RELATED |
Building a Graph Manually¶
from fcc.knowledge import (
KnowledgeGraph, KnowledgeNode, KnowledgeEdge,
NodeType, EdgeType,
)
graph = KnowledgeGraph()
# Add persona nodes
graph.add_node(KnowledgeNode(
id="persona:RC",
label="Research Curator",
node_type=NodeType.PERSONA,
properties={"phase": "Find", "category": "core"},
))
graph.add_node(KnowledgeNode(
id="persona:SA",
label="Solution Architect",
node_type=NodeType.PERSONA,
properties={"phase": "Create", "category": "core"},
))
# Add collaboration edge
graph.add_edge(KnowledgeEdge(
source="persona:RC",
target="persona:SA",
edge_type=EdgeType.COLLABORATES,
weight=0.9,
))
print(graph.stats())
# GraphStats(nodes=2, edges=1, node_types={'PERSONA': 2})
Building from the PersonaRegistry¶
The build_persona_graph function automatically constructs a knowledge
graph from a PersonaRegistry, creating nodes for each persona and edges
for collaboration links, category membership, and phase groupings:
from fcc.knowledge import build_persona_graph
from fcc.personas.registry import PersonaRegistry
registry = PersonaRegistry.from_data_dir()
graph = build_persona_graph(registry)
print(f"Nodes: {graph.node_count}")
print(f"Edges: {graph.edge_count}")
# Query by type
personas = graph.nodes_by_type(NodeType.PERSONA)
print(f"Persona nodes: {len(personas)}")
Graph Queries¶
The KnowledgeGraph supports several query patterns:
| Query | Method | Description |
|---|---|---|
| By node type | nodes_by_type(NodeType) |
All nodes of a given type |
| By edge type | edges_by_type(EdgeType) |
All edges of a given type |
| Neighbors | neighbors(node_id) |
Directly connected nodes |
| Subgraph | subgraph(node_ids) |
Extract a subgraph |
| Path | shortest_path(src, dst) |
Find shortest path between nodes |
| Merge | merge(other_graph) |
Combine two graphs |
Serialization Formats¶
Knowledge graphs need to be exported for use by external tools, triple stores, and ontology editors. FCC supports three standard formats:
Turtle (TTL)¶
Turtle is the most common RDF serialization format. Each statement is a subject-predicate-object triple:
from fcc.knowledge import serialize_turtle
ttl_output = serialize_turtle(graph, base_uri="https://fcc.example.org/")
print(ttl_output)
Output:
@prefix fcc: <https://fcc.example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
fcc:persona-RC a fcc:Persona ;
rdfs:label "Research Curator" ;
fcc:phase "Find" ;
fcc:collaboratesWith fcc:persona-SA .
JSON-LD¶
JSON-LD is the JSON-based RDF format, well-suited for web APIs and JavaScript consumers:
from fcc.knowledge import serialize_jsonld
jsonld_output = serialize_jsonld(graph, context_uri="https://fcc.example.org/context")
print(jsonld_output)
The output includes a @context block that maps short property names to
full IRIs, making the JSON human-readable while remaining RDF-compatible.
SKOS (Simple Knowledge Organization System)¶
SKOS is used for taxonomies and classification schemes. FCC exports persona categories and phases as SKOS concept schemes:
from fcc.knowledge import serialize_skos
skos_output = serialize_skos(graph, scheme_uri="https://fcc.example.org/personas")
print(skos_output)
SKOS export produces:
| SKOS Element | FCC Mapping |
|---|---|
skos:ConceptScheme |
Persona category or phase grouping |
skos:Concept |
Individual persona |
skos:broader |
Category membership |
skos:related |
Collaboration links |
skos:prefLabel |
Persona name |
RAG Pipeline Architecture¶
Retrieval-Augmented Generation (RAG) combines search with LLM generation. Instead of asking an LLM to answer from its training data alone, RAG retrieves relevant context from a knowledge base and includes it in the prompt.
Pipeline Stages¶
The FCC RAG pipeline has four stages:
| Stage | Module | Responsibility |
|---|---|---|
| Chunking | DocumentChunker |
Split documents into searchable chunks |
| Indexing | SearchIndex |
Embed and store chunks for similarity search |
| Retrieval | SemanticRetriever |
Find relevant chunks for a given query |
| Generation | RAGPipeline |
Compose a prompt with retrieved context and generate an answer |
Chunking Strategies¶
The DocumentChunker supports multiple strategies depending on the
content type:
| Strategy | Best For | How It Works |
|---|---|---|
FIXED_SIZE |
Plain text | Split into fixed-character windows with overlap |
PARAGRAPH |
Markdown, prose | Split on blank lines |
CODE |
Python, YAML | Split on function/class boundaries |
SEMANTIC |
Mixed content | Split on topic boundaries using embeddings |
from fcc.rag import DocumentChunker, ChunkingStrategy
chunker = DocumentChunker()
# Chunk Python code
code_chunks = chunker.chunk(
python_source,
strategy=ChunkingStrategy.CODE,
metadata={"source": "workflow.py"},
)
# Chunk markdown documentation
doc_chunks = chunker.chunk(
markdown_text,
strategy=ChunkingStrategy.PARAGRAPH,
metadata={"source": "guidebook-ch17.md"},
)
SemanticRetriever¶
The SemanticRetriever wraps a SearchIndex to provide retrieval with
filtering and re-ranking:
from fcc.rag import SemanticRetriever
retriever = SemanticRetriever(index, top_k=5, min_score=0.3)
chunks = retriever.retrieve("How does FCC handle governance?")
for chunk in chunks:
print(f" [{chunk.score:.2f}] {chunk.text[:80]}...")
RAGPipeline¶
The RAGPipeline orchestrates the full retrieve-then-generate workflow:
from fcc.rag import RAGPipeline
pipeline = RAGPipeline(
retriever=retriever,
persona_context="Research Curator", # optional persona framing
)
answer = pipeline.query("What are the key governance constraints?")
print(answer.text)
print(f"Sources: {[s.doc_id for s in answer.sources]}")
When a persona_context is provided, the pipeline frames the generation
prompt using that persona's R.I.S.C.E.A.R. specification, producing
answers that reflect the persona's style, constraints, and expertise.
Federated Knowledge Graph¶
In a multi-project ecosystem, each project may maintain its own knowledge
graph. The FederatedKnowledgeGraph provides a unified view across
projects without requiring data centralization.
NamespaceRegistry¶
Each project registers a namespace to prevent ID collisions:
from fcc.federation import NamespaceRegistry
ns = NamespaceRegistry()
ns.register("fcc", "https://fcc.example.org/")
ns.register("paom", "https://paom.example.org/")
ns.register("constel", "https://constel.example.org/")
# Resolve qualified names
full_uri = ns.resolve("fcc:persona-RC")
# "https://fcc.example.org/persona-RC"
EntityResolver¶
The EntityResolver maps entities across namespaces using vocabulary
mappings:
| Method | Description |
|---|---|
resolve(entity_id, target_ns) |
Find the equivalent entity in another namespace |
add_mapping(source, target, confidence) |
Register a cross-namespace mapping |
mappings_for(entity_id) |
List all known mappings for an entity |
from fcc.federation import EntityResolver
resolver = EntityResolver(namespace_registry=ns)
resolver.add_mapping("fcc:persona-RC", "paom:agent-research", confidence=0.95)
# Resolve across namespaces
paom_id = resolver.resolve("fcc:persona-RC", target_ns="paom")
print(paom_id) # "paom:agent-research"
FederatedKnowledgeGraph¶
The FederatedKnowledgeGraph merges multiple project graphs while
preserving namespace boundaries:
from fcc.federation import FederatedKnowledgeGraph
federated = FederatedKnowledgeGraph(
namespace_registry=ns,
entity_resolver=resolver,
)
federated.add_graph("fcc", fcc_graph)
federated.add_graph("paom", paom_graph)
# Query across all namespaces
all_personas = federated.query_nodes(node_type=NodeType.PERSONA)
print(f"Total personas across projects: {len(all_personas)}")
# Find cross-project relationships
cross_edges = federated.cross_namespace_edges()
print(f"Cross-project edges: {len(cross_edges)}")
Conflict Resolution¶
When merging graphs from different projects, conflicts can arise:
| Conflict Type | Resolution Strategy |
|---|---|
| Duplicate node IDs | Namespace-qualified IDs prevent collisions |
| Contradictory properties | Latest-write-wins or confidence-based selection |
| Missing mappings | EntityResolver logs unresolved references |
| Schema mismatch | VocabularyMapping normalizes property names |
Practical Exercises¶
Exercise 1: Build a Semantic Search Index¶
Using Notebook 15, create a
PersonaSearchIndex and search for personas by capability description
rather than exact ID lookup.
Exercise 2: Construct a Knowledge Graph¶
Using Notebook 16, build a knowledge graph from the full persona registry and export it to Turtle format. Verify that collaboration edges are correctly represented.
Exercise 3: RAG over FCC Documentation¶
Using Notebook 17, chunk the FCC guidebook chapters, build a retrieval index, and query it with persona-framed questions. Compare answers generated with and without persona context.
Exercise 4: Federate Two Project Graphs¶
Build separate knowledge graphs for FCC and a mock PAOM project. Use
FederatedKnowledgeGraph to merge them and query for cross-project
personas. Verify that entity resolution correctly maps equivalent
concepts.
Summary¶
Knowledge federation transforms FCC from a standalone framework into a semantic hub that connects personas, workflows, and governance artifacts across projects. The four pillars — semantic search, knowledge graphs, serialization, and RAG — provide increasing levels of intelligence:
| Pillar | Capability | Use Case |
|---|---|---|
| Semantic Search | Vector similarity | "Find personas related to privacy" |
| Knowledge Graphs | Typed relationships | "Show all collaborators of the Research Curator" |
| Serialization | Interoperability | "Export persona taxonomy to a triple store" |
| RAG Pipeline | Context-aware generation | "Answer governance questions using FCC docs" |
The federated layer adds cross-project capabilities through namespace management, entity resolution, and graph merging.
Next Steps
- Explore Notebook 15 for hands-on semantic search
- Explore Notebook 16 for knowledge graph construction
- Explore Notebook 17 for RAG pipeline development
- Read Chapter 18 for documentation intelligence using AST analysis