Skip to content

Shared Knowledge Base

As your team accumulates artifacts -- design docs, ADRs, runbooks, persona configurations, tutorials, and workflow traces -- the ability to find and contextualize prior knowledge becomes the rate-limiting factor for new work. This guide explains how to build a team-level knowledge base using FCC's semantic search, knowledge graph, RAG pipeline, and federation modules.

A well-designed shared knowledge base turns FCC from a per-workflow tool into a team memory substrate: every workflow run enriches it, every new query benefits from it, and every new team member onboards faster because of it.


What a Shared Knowledge Base Does

A team-level knowledge base provides four capabilities:

  1. Semantic search -- natural-language queries over personas, actions, workflow traces, and docs
  2. Knowledge graphs -- explicit relationships between personas, workflows, artifacts, and decisions
  3. RAG-augmented queries -- retrieval-augmented generation for persona prompts
  4. Federation -- cross-team knowledge resolution while preserving team namespaces

You do not need all four on day one. Start with semantic search, add RAG in month two, add a knowledge graph in month three, and consider federation only when you have 3+ teams.


Architecture Overview

flowchart TD
    subgraph Sources[Knowledge Sources]
        ADR[ADRs]
        Doc[Docs]
        RB[Runbooks]
        PT[Persona Configs]
        WT[Workflow Traces]
        Chat[Chat Archives]
    end

    subgraph Ingest[Ingestion Layer]
        Chunker[DocumentChunker<br/>6 strategies]
        Embed[EmbeddingProvider<br/>pluggable]
    end

    subgraph Storage[Storage Layer]
        SI[SearchIndex]
        PSI[PersonaSearchIndex]
        ASI[ActionSearchIndex]
        KG[KnowledgeGraph<br/>9 node/edge types]
    end

    subgraph Query[Query Layer]
        Retrieve[SemanticRetriever]
        RAG[RAGPipeline<br/>persona-aware]
        KGQ[KG Query API]
    end

    subgraph Apps[Applications]
        Persona[Persona Prompts]
        Search[Search UI]
        Onboard[Onboarding]
        Audit[Audit Trail]
    end

    Sources --> Chunker
    Chunker --> Embed
    Embed --> SI
    Embed --> PSI
    Embed --> ASI
    Sources --> KG
    SI --> Retrieve
    PSI --> Retrieve
    Retrieve --> RAG
    KG --> KGQ
    RAG --> Persona
    Retrieve --> Search
    RAG --> Onboard
    KG --> Audit

    classDef source fill:#e1f5ff,stroke:#0277bd,stroke-width:2px;
    classDef ingest fill:#fff3e0,stroke:#e65100,stroke-width:2px;
    classDef storage fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px;
    classDef query fill:#fce4ec,stroke:#880e4f,stroke-width:2px;
    classDef app fill:#e8eaf6,stroke:#283593,stroke-width:2px;
    class ADR,Doc,RB,PT,WT,Chat source;
    class Chunker,Embed ingest;
    class SI,PSI,ASI,KG storage;
    class Retrieve,RAG,KGQ query;
    class Persona,Search,Onboard,Audit app;

Building a Team Knowledge Base with FCC

Step 1: Inventory Your Knowledge Sources

Before indexing anything, list the knowledge your team generates and consumes:

Source Typical Volume Update Frequency Priority
ADRs 20-200 Weekly High
Design docs 50-500 Daily High
Runbooks 10-50 Monthly High
Persona configs 5-60 Per change Medium
Workflow traces 100-10000+ Per run Medium
Meeting notes 50-500 Weekly Low
Chat archives 1000+ Hourly Low

Start with the high-priority sources. Add others as you prove value.

Step 2: Pick a Chunking Strategy

The DocumentChunker offers six strategies:

Strategy When to Use Chunk Size
fixed_size Uniform short documents 512 tokens
sentence Natural prose, docs Variable
paragraph Long-form articles Variable
markdown_header Structured docs with headings Variable
code_aware Source code, snippets Variable
semantic Mixed content, highest quality Variable

Chunking strategy by source

  • ADRs and design docs -> markdown_header
  • Runbooks -> paragraph
  • Code snippets -> code_aware
  • Workflow traces -> fixed_size (JSON structure is already segmented)
  • Chat archives -> semantic

Step 3: Configure an Embedding Provider

FCC provides an EmbeddingProvider protocol. For local development and CI, use the MockEmbeddingProvider (384 dimensions, deterministic). For production, plug in a real provider:

from fcc.api import PersonaSearchIndex
from fcc.search.embeddings import EmbeddingProvider

class SentenceTransformerProvider(EmbeddingProvider):
    def __init__(self, model_name="all-MiniLM-L6-v2"):
        from sentence_transformers import SentenceTransformer
        self.model = SentenceTransformer(model_name)
        self.dimension = 384

    def embed(self, texts: list[str]) -> list[list[float]]:
        return self.model.encode(texts).tolist()

provider = SentenceTransformerProvider()
index = PersonaSearchIndex(embedding_provider=provider)

Provider stability

Switching providers invalidates your index. Standardize on one provider at the team level and version it. When you upgrade, rebuild the index end-to-end.

Step 4: Build Indices

Start with three indices that cover 80% of team queries:

from fcc.api import PersonaRegistry, PersonaSearchIndex, SearchIndex
from fcc.api import DocumentChunker

# 1. Persona index - "which persona handles X?"
registry = PersonaRegistry.load_default()
persona_index = PersonaSearchIndex(embedding_provider=provider)
persona_index.index_registry(registry)

# 2. Team docs index - "what did we decide about Y?"
chunker = DocumentChunker(strategy="markdown_header")
docs_index = SearchIndex(embedding_provider=provider)
for doc_path in team_docs_paths:
    chunks = chunker.chunk_file(doc_path)
    docs_index.add_documents(chunks, source=doc_path)

# 3. Action index - "how do we perform Z?"
from fcc.api import ActionSearchIndex
action_index = ActionSearchIndex(embedding_provider=provider)
action_index.index_from_registry(registry)

Step 5: Wrap in a RAG Pipeline

The RAGPipeline adds persona-aware retrieval to any prompt:

from fcc.api import RAGPipeline

rag = RAGPipeline(
    search_index=docs_index,
    persona_registry=registry,
    top_k=5,
)

# Persona-aware query
result = rag.query(
    question="What is our incident severity taxonomy?",
    persona_id="IC",  # bias retrieval toward IC-relevant docs
)

The pipeline automatically retrieves context relevant to the invoking persona, inserts it into the prompt, and returns both the answer and the source citations.


PersonaSearchIndex for Team Queries

The PersonaSearchIndex enables queries like:

  • "Which persona should handle a protocol compliance question?"
  • "What personas orchestrate research workflows?"
  • "Show me personas with a security specialization"
results = persona_index.search(
    query="protocol compliance validation",
    top_k=3,
    filters={"category": "protocol_engineering"}
)

for result in results:
    print(f"{result.persona_id}: {result.score:.3f} - {result.role_title}")

Team-Specific Persona Queries

Extend the index with team-local metadata:

persona_index.add_metadata("BC", {
    "owners": ["alice", "bob"],
    "last_customized": "2026-03-15",
    "team": "payments-alpha",
})

# Query by team
team_personas = persona_index.search(
    query="design patterns",
    filters={"team": "payments-alpha"}
)

This lets each team see its own persona customizations first, while preserving access to the shared registry.


Team Knowledge Graphs

The KnowledgeGraph captures explicit relationships:

Node Types Edge Types
Persona champions, orchestrates, collaborates_with
Workflow invokes, produces, consumes
Action part_of, precedes, follows
Artifact authored_by, reviewed_by, referenced_by
Decision decided_by, affects, supersedes

Building a Team KG

from fcc.api import KnowledgeGraph
from fcc.knowledge.builders import build_full_fcc_graph

# Start from the shipped FCC graph
kg = build_full_fcc_graph(registry)

# Add team-specific nodes and edges
kg.add_node("adr-042", type="Decision", label="Use PostgreSQL for events")
kg.add_edge("adr-042", "decided_by", "BC")
kg.add_edge("adr-042", "affects", "event-bus-workflow")

# Query: what decisions affect a workflow?
decisions = kg.query(
    node_type="Decision",
    edge="affects",
    target="event-bus-workflow"
)

Serialization for Sharing

Export team KGs in standard formats for cross-team consumption:

# OWL for ontology tooling
kg.serialize_owl("team-kg.owl")

# JSON-LD for web interoperability
kg.serialize_jsonld("team-kg.jsonld")

# SKOS for concept mapping
kg.serialize_skos("team-kg.ttl")

RAG Patterns for Team Documentation

Pattern 1: Onboarding Q&A

onboarding_rag = RAGPipeline(
    search_index=docs_index,
    persona_registry=registry,
    top_k=8,
    include_sources=True,
)

new_hire_questions = [
    "How do we deploy to staging?",
    "What is our code review process?",
    "Who owns the payments service?",
    "What ADR covers our error handling approach?",
]

for q in new_hire_questions:
    result = onboarding_rag.query(q, persona_id="DE")
    print(f"Q: {q}\nA: {result.answer}\nSources: {result.sources}\n")

Pattern 2: Decision Archaeology

"Why did we decide X?" is the highest-value query for long-lived teams.

decision_rag = RAGPipeline(
    search_index=adr_index,  # ADRs only
    persona_registry=registry,
    top_k=3,
)

result = decision_rag.query(
    "Why did we choose PostgreSQL over DynamoDB?",
    persona_id="BC",
)

Pattern 3: Runbook Retrieval During Incidents

incident_rag = RAGPipeline(
    search_index=runbook_index,
    persona_registry=registry,
    top_k=2,  # focused, not exhaustive
    max_tokens=1500,  # short, actionable
)

result = incident_rag.query(
    "How do I restart the payments worker?",
    persona_id="IC",
)

RAG hygiene for runbooks

Runbook RAG answers must be short, actionable, and cite sources. Incident responders do not have time to read five paragraphs. Set max_tokens low and always include source URLs.


Federation Across Teams

When multiple teams have their own knowledge bases, the FederatedKnowledgeGraph resolves cross-team queries without flattening team namespaces.

from fcc.knowledge.federation import FederatedKnowledgeGraph
from fcc.api import NamespaceRegistry

# Register team namespaces
ns = NamespaceRegistry()
ns.register("alpha", "payments", "https://payments.team/")
ns.register("beta", "search", "https://search.team/")

# Federate each team's KG
fed_kg = FederatedKnowledgeGraph(namespaces=ns)
fed_kg.add_graph("alpha", alpha_kg)
fed_kg.add_graph("beta", beta_kg)

# Cross-team query
results = fed_kg.query(
    node_type="Decision",
    predicate="affects",
    target="event-bus-workflow"
)
# Returns decisions from BOTH teams, with source namespace

Federation Architecture

flowchart TD
    subgraph TeamA[Team Alpha - Payments]
        KGA[Team Alpha KG]
        DocA[Team Alpha Docs]
    end

    subgraph TeamB[Team Beta - Search]
        KGB[Team Beta KG]
        DocB[Team Beta Docs]
    end

    subgraph TeamC[Team Gamma - Analytics]
        KGC[Team Gamma KG]
        DocC[Team Gamma Docs]
    end

    subgraph Federation[Federation Layer]
        NS[Namespace Registry]
        Resolver[Entity Resolver]
        FedKG[Federated KG]
    end

    subgraph Queries[Cross-Team Queries]
        Q1[Who owns X?]
        Q2[Which team decided Y?]
        Q3[What ADRs touch service Z?]
    end

    TeamA --> Federation
    TeamB --> Federation
    TeamC --> Federation
    Federation --> Queries

    classDef team fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px;
    classDef fed fill:#fff3e0,stroke:#e65100,stroke-width:2px;
    classDef query fill:#e1f5ff,stroke:#0277bd,stroke-width:2px;
    class KGA,KGB,KGC,DocA,DocB,DocC team;
    class NS,Resolver,FedKG fed;
    class Q1,Q2,Q3 query;

Privacy and Access Control

A team knowledge base concentrates sensitive information. Before enabling semantic search or RAG, address three privacy concerns.

Concern 1: PII in Indexed Content

Chat archives, incident post-mortems, and customer-facing docs often contain PII. Mitigate by:

  • Running PII detection before indexing (names, emails, identifiers)
  • Excluding source directories with known-sensitive content
  • Redacting before chunking, not after retrieval
from fcc.api import DocumentChunker

chunker = DocumentChunker(
    strategy="markdown_header",
    preprocess=[redact_pii, strip_credentials],  # custom filters
)

Concern 2: Cross-Team Leakage

A federated KG may surface a team's internal decisions to other teams. Mitigate by:

  • Tagging nodes/edges with visibility levels (public, team, restricted)
  • Filtering queries by caller's namespace
  • Auditing cross-team queries via event bus
fed_kg.add_node(
    "adr-042",
    type="Decision",
    visibility="team",  # only Team Alpha can retrieve
    owner_team="alpha",
)

Concern 3: Embedding Provider Data Exfiltration

If you use a hosted embedding provider (OpenAI, Cohere, etc.), every indexed document is sent to that provider. Mitigate by:

  • Using local/self-hosted embedding models for sensitive content
  • Pre-filtering text before sending to hosted providers
  • Reviewing provider data-handling terms quarterly

Access control is not confidentiality

Access controls in your index layer prevent accidental leakage between authorized users. They do not protect against a compromised embedding provider or a breach of your vector store. Treat index contents as sensitive.


Operational Concerns

Index Freshness

Stale indices drive bad answers. Freshness strategies:

Strategy Freshness Cost
Nightly rebuild 24h lag Low
Event-triggered reindex Near-real-time Medium
Hybrid (full nightly + event delta) <15 min Medium-high
Streaming updates <1 min High

Most teams start with nightly rebuilds and add event-triggered delta updates within 3 months.

Index Size and Cost

Plan for growth. A team generating 20 ADRs, 100 design docs, and 500 workflow traces per quarter accumulates ~50K embedded chunks per year. Budget storage and query costs accordingly.

Monitoring

Instrument your index with FCC observability:

  • Query latency p50/p95/p99
  • Retrieval precision (spot-check 10 queries weekly)
  • Index freshness gap
  • Failed embedding requests

Next Steps