Shared Knowledge Base¶

As your team accumulates artifacts -- design docs, ADRs, runbooks, persona configurations, tutorials, and workflow traces -- the ability to find and contextualize prior knowledge becomes the rate-limiting factor for new work. This guide explains how to build a team-level knowledge base using FCC's semantic search, knowledge graph, RAG pipeline, and federation modules.

A well-designed shared knowledge base turns FCC from a per-workflow tool into a team memory substrate: every workflow run enriches it, every new query benefits from it, and every new team member onboards faster because of it.

What a Shared Knowledge Base Does¶

A team-level knowledge base provides four capabilities:

Semantic search -- natural-language queries over personas, actions, workflow traces, and docs
Knowledge graphs -- explicit relationships between personas, workflows, artifacts, and decisions
RAG-augmented queries -- retrieval-augmented generation for persona prompts
Federation -- cross-team knowledge resolution while preserving team namespaces

You do not need all four on day one. Start with semantic search, add RAG in month two, add a knowledge graph in month three, and consider federation only when you have 3+ teams.

Architecture Overview¶

flowchart TD
    subgraph Sources[Knowledge Sources]
        ADR[ADRs]
        Doc[Docs]
        RB[Runbooks]
        PT[Persona Configs]
        WT[Workflow Traces]
        Chat[Chat Archives]
    end

    subgraph Ingest[Ingestion Layer]
        Chunker[DocumentChunker<br/>6 strategies]
        Embed[EmbeddingProvider<br/>pluggable]
    end

    subgraph Storage[Storage Layer]
        SI[SearchIndex]
        PSI[PersonaSearchIndex]
        ASI[ActionSearchIndex]
        KG[KnowledgeGraph<br/>9 node/edge types]
    end

    subgraph Query[Query Layer]
        Retrieve[SemanticRetriever]
        RAG[RAGPipeline<br/>persona-aware]
        KGQ[KG Query API]
    end

    subgraph Apps[Applications]
        Persona[Persona Prompts]
        Search[Search UI]
        Onboard[Onboarding]
        Audit[Audit Trail]
    end

    Sources --> Chunker
    Chunker --> Embed
    Embed --> SI
    Embed --> PSI
    Embed --> ASI
    Sources --> KG
    SI --> Retrieve
    PSI --> Retrieve
    Retrieve --> RAG
    KG --> KGQ
    RAG --> Persona
    Retrieve --> Search
    RAG --> Onboard
    KG --> Audit

    classDef source fill:#e1f5ff,stroke:#0277bd,stroke-width:2px;
    classDef ingest fill:#fff3e0,stroke:#e65100,stroke-width:2px;
    classDef storage fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px;
    classDef query fill:#fce4ec,stroke:#880e4f,stroke-width:2px;
    classDef app fill:#e8eaf6,stroke:#283593,stroke-width:2px;
    class ADR,Doc,RB,PT,WT,Chat source;
    class Chunker,Embed ingest;
    class SI,PSI,ASI,KG storage;
    class Retrieve,RAG,KGQ query;
    class Persona,Search,Onboard,Audit app;

Building a Team Knowledge Base with FCC¶

Step 1: Inventory Your Knowledge Sources¶

Before indexing anything, list the knowledge your team generates and consumes:

Source	Typical Volume	Update Frequency	Priority
ADRs	20-200	Weekly	High
Design docs	50-500	Daily	High
Runbooks	10-50	Monthly	High
Persona configs	5-60	Per change	Medium
Workflow traces	100-10000+	Per run	Medium
Meeting notes	50-500	Weekly	Low
Chat archives	1000+	Hourly	Low

Start with the high-priority sources. Add others as you prove value.

Step 2: Pick a Chunking Strategy¶

The DocumentChunker offers six strategies:

Strategy	When to Use	Chunk Size
`fixed_size`	Uniform short documents	512 tokens
`sentence`	Natural prose, docs	Variable
`paragraph`	Long-form articles	Variable
`markdown_header`	Structured docs with headings	Variable
`code_aware`	Source code, snippets	Variable
`semantic`	Mixed content, highest quality	Variable

Chunking strategy by source

ADRs and design docs -> markdown_header
Runbooks -> paragraph
Code snippets -> code_aware
Workflow traces -> fixed_size (JSON structure is already segmented)
Chat archives -> semantic

Step 3: Configure an Embedding Provider¶

FCC provides an EmbeddingProvider protocol. For local development and CI, use the MockEmbeddingProvider (384 dimensions, deterministic). For production, plug in a real provider:

from fcc.api import PersonaSearchIndex
from fcc.search.embeddings import EmbeddingProvider

class SentenceTransformerProvider(EmbeddingProvider):
    def __init__(self, model_name="all-MiniLM-L6-v2"):
        from sentence_transformers import SentenceTransformer
        self.model = SentenceTransformer(model_name)
        self.dimension = 384

    def embed(self, texts: list[str]) -> list[list[float]]:
        return self.model.encode(texts).tolist()

provider = SentenceTransformerProvider()
index = PersonaSearchIndex(embedding_provider=provider)

Provider stability

Switching providers invalidates your index. Standardize on one provider at the team level and version it. When you upgrade, rebuild the index end-to-end.

Step 4: Build Indices¶

Start with three indices that cover 80% of team queries:

from fcc.api import PersonaRegistry, PersonaSearchIndex, SearchIndex
from fcc.api import DocumentChunker

# 1. Persona index - "which persona handles X?"
registry = PersonaRegistry.load_default()
persona_index = PersonaSearchIndex(embedding_provider=provider)
persona_index.index_registry(registry)

# 2. Team docs index - "what did we decide about Y?"
chunker = DocumentChunker(strategy="markdown_header")
docs_index = SearchIndex(embedding_provider=provider)
for doc_path in team_docs_paths:
    chunks = chunker.chunk_file(doc_path)
    docs_index.add_documents(chunks, source=doc_path)

# 3. Action index - "how do we perform Z?"
from fcc.api import ActionSearchIndex
action_index = ActionSearchIndex(embedding_provider=provider)
action_index.index_from_registry(registry)

Step 5: Wrap in a RAG Pipeline¶

The RAGPipeline adds persona-aware retrieval to any prompt:

from fcc.api import RAGPipeline

rag = RAGPipeline(
    search_index=docs_index,
    persona_registry=registry,
    top_k=5,
)

# Persona-aware query
result = rag.query(
    question="What is our incident severity taxonomy?",
    persona_id="IC",  # bias retrieval toward IC-relevant docs
)

The pipeline automatically retrieves context relevant to the invoking persona, inserts it into the prompt, and returns both the answer and the source citations.

PersonaSearchIndex for Team Queries¶

The PersonaSearchIndex enables queries like:

"Which persona should handle a protocol compliance question?"
"What personas orchestrate research workflows?"
"Show me personas with a security specialization"

results = persona_index.search(
    query="protocol compliance validation",
    top_k=3,
    filters={"category": "protocol_engineering"}
)

for result in results:
    print(f"{result.persona_id}: {result.score:.3f} - {result.role_title}")

Team-Specific Persona Queries¶

Extend the index with team-local metadata:

persona_index.add_metadata("BC", {
    "owners": ["alice", "bob"],
    "last_customized": "2026-03-15",
    "team": "payments-alpha",
})

# Query by team
team_personas = persona_index.search(
    query="design patterns",
    filters={"team": "payments-alpha"}
)

This lets each team see its own persona customizations first, while preserving access to the shared registry.

Team Knowledge Graphs¶

The KnowledgeGraph captures explicit relationships:

Node Types	Edge Types
Persona	champions, orchestrates, collaborates_with
Workflow	invokes, produces, consumes
Action	part_of, precedes, follows
Artifact	authored_by, reviewed_by, referenced_by
Decision	decided_by, affects, supersedes

Building a Team KG¶

from fcc.api import KnowledgeGraph
from fcc.knowledge.builders import build_full_fcc_graph

# Start from the shipped FCC graph
kg = build_full_fcc_graph(registry)

# Add team-specific nodes and edges
kg.add_node("adr-042", type="Decision", label="Use PostgreSQL for events")
kg.add_edge("adr-042", "decided_by", "BC")
kg.add_edge("adr-042", "affects", "event-bus-workflow")

# Query: what decisions affect a workflow?
decisions = kg.query(
    node_type="Decision",
    edge="affects",
    target="event-bus-workflow"
)

Export team KGs in standard formats for cross-team consumption:

# OWL for ontology tooling
kg.serialize_owl("team-kg.owl")

# JSON-LD for web interoperability
kg.serialize_jsonld("team-kg.jsonld")

# SKOS for concept mapping
kg.serialize_skos("team-kg.ttl")

RAG Patterns for Team Documentation¶

Pattern 1: Onboarding Q&A¶

onboarding_rag = RAGPipeline(
    search_index=docs_index,
    persona_registry=registry,
    top_k=8,
    include_sources=True,
)

new_hire_questions = [
    "How do we deploy to staging?",
    "What is our code review process?",
    "Who owns the payments service?",
    "What ADR covers our error handling approach?",
]

for q in new_hire_questions:
    result = onboarding_rag.query(q, persona_id="DE")
    print(f"Q: {q}\nA: {result.answer}\nSources: {result.sources}\n")

Pattern 2: Decision Archaeology¶

"Why did we decide X?" is the highest-value query for long-lived teams.

decision_rag = RAGPipeline(
    search_index=adr_index,  # ADRs only
    persona_registry=registry,
    top_k=3,
)

result = decision_rag.query(
    "Why did we choose PostgreSQL over DynamoDB?",
    persona_id="BC",
)

Pattern 3: Runbook Retrieval During Incidents¶

incident_rag = RAGPipeline(
    search_index=runbook_index,
    persona_registry=registry,
    top_k=2,  # focused, not exhaustive
    max_tokens=1500,  # short, actionable
)

result = incident_rag.query(
    "How do I restart the payments worker?",
    persona_id="IC",
)

RAG hygiene for runbooks

Runbook RAG answers must be short, actionable, and cite sources. Incident responders do not have time to read five paragraphs. Set max_tokens low and always include source URLs.

Federation Across Teams¶

When multiple teams have their own knowledge bases, the FederatedKnowledgeGraph resolves cross-team queries without flattening team namespaces.

from fcc.knowledge.federation import FederatedKnowledgeGraph
from fcc.api import NamespaceRegistry

# Register team namespaces
ns = NamespaceRegistry()
ns.register("alpha", "payments", "https://payments.team/")
ns.register("beta", "search", "https://search.team/")

# Federate each team's KG
fed_kg = FederatedKnowledgeGraph(namespaces=ns)
fed_kg.add_graph("alpha", alpha_kg)
fed_kg.add_graph("beta", beta_kg)

# Cross-team query
results = fed_kg.query(
    node_type="Decision",
    predicate="affects",
    target="event-bus-workflow"
)
# Returns decisions from BOTH teams, with source namespace

Federation Architecture¶

flowchart TD
    subgraph TeamA[Team Alpha - Payments]
        KGA[Team Alpha KG]
        DocA[Team Alpha Docs]
    end

    subgraph TeamB[Team Beta - Search]
        KGB[Team Beta KG]
        DocB[Team Beta Docs]
    end

    subgraph TeamC[Team Gamma - Analytics]
        KGC[Team Gamma KG]
        DocC[Team Gamma Docs]
    end

    subgraph Federation[Federation Layer]
        NS[Namespace Registry]
        Resolver[Entity Resolver]
        FedKG[Federated KG]
    end

    subgraph Queries[Cross-Team Queries]
        Q1[Who owns X?]
        Q2[Which team decided Y?]
        Q3[What ADRs touch service Z?]
    end

    TeamA --> Federation
    TeamB --> Federation
    TeamC --> Federation
    Federation --> Queries

    classDef team fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px;
    classDef fed fill:#fff3e0,stroke:#e65100,stroke-width:2px;
    classDef query fill:#e1f5ff,stroke:#0277bd,stroke-width:2px;
    class KGA,KGB,KGC,DocA,DocB,DocC team;
    class NS,Resolver,FedKG fed;
    class Q1,Q2,Q3 query;

Privacy and Access Control¶

A team knowledge base concentrates sensitive information. Before enabling semantic search or RAG, address three privacy concerns.

Concern 1: PII in Indexed Content¶

Chat archives, incident post-mortems, and customer-facing docs often contain PII. Mitigate by:

Running PII detection before indexing (names, emails, identifiers)
Excluding source directories with known-sensitive content
Redacting before chunking, not after retrieval

from fcc.api import DocumentChunker

chunker = DocumentChunker(
    strategy="markdown_header",
    preprocess=[redact_pii, strip_credentials],  # custom filters
)

Concern 2: Cross-Team Leakage¶

A federated KG may surface a team's internal decisions to other teams. Mitigate by:

Tagging nodes/edges with visibility levels (public, team, restricted)
Filtering queries by caller's namespace
Auditing cross-team queries via event bus

fed_kg.add_node(
    "adr-042",
    type="Decision",
    visibility="team",  # only Team Alpha can retrieve
    owner_team="alpha",
)

Concern 3: Embedding Provider Data Exfiltration¶

If you use a hosted embedding provider (OpenAI, Cohere, etc.), every indexed document is sent to that provider. Mitigate by:

Using local/self-hosted embedding models for sensitive content
Pre-filtering text before sending to hosted providers
Reviewing provider data-handling terms quarterly

Access control is not confidentiality

Access controls in your index layer prevent accidental leakage between authorized users. They do not protect against a compromised embedding provider or a breach of your vector store. Treat index contents as sensitive.

Operational Concerns¶

Index Freshness¶

Stale indices drive bad answers. Freshness strategies:

Strategy	Freshness	Cost
Nightly rebuild	24h lag	Low
Event-triggered reindex	Near-real-time	Medium
Hybrid (full nightly + event delta)	<15 min	Medium-high
Streaming updates	<1 min	High

Most teams start with nightly rebuilds and add event-triggered delta updates within 3 months.

Index Size and Cost¶

Plan for growth. A team generating 20 ADRs, 100 design docs, and 500 workflow traces per quarter accumulates ~50K embedded chunks per year. Budget storage and query costs accordingly.

Monitoring¶

Instrument your index with FCC observability:

Query latency p50/p95/p99
Retrieval precision (spot-check 10 queries weekly)
Index freshness gap
Failed embedding requests

Next Steps¶

Multi-Team Governance -- governance for federated KGs
Team Scaling Guide -- when to add KG/RAG by stage
RAG Pipeline Documentation -- deeper dive on the RAG module
Knowledge Graph Documentation -- KG nodes, edges, serializers
Federation Documentation -- cross-team entity resolution
Semantic Search Prompts -- sample queries