Shared Knowledge Base¶
As your team accumulates artifacts -- design docs, ADRs, runbooks, persona configurations, tutorials, and workflow traces -- the ability to find and contextualize prior knowledge becomes the rate-limiting factor for new work. This guide explains how to build a team-level knowledge base using FCC's semantic search, knowledge graph, RAG pipeline, and federation modules.
A well-designed shared knowledge base turns FCC from a per-workflow tool into a team memory substrate: every workflow run enriches it, every new query benefits from it, and every new team member onboards faster because of it.
What a Shared Knowledge Base Does¶
A team-level knowledge base provides four capabilities:
- Semantic search -- natural-language queries over personas, actions, workflow traces, and docs
- Knowledge graphs -- explicit relationships between personas, workflows, artifacts, and decisions
- RAG-augmented queries -- retrieval-augmented generation for persona prompts
- Federation -- cross-team knowledge resolution while preserving team namespaces
You do not need all four on day one. Start with semantic search, add RAG in month two, add a knowledge graph in month three, and consider federation only when you have 3+ teams.
Architecture Overview¶
flowchart TD
subgraph Sources[Knowledge Sources]
ADR[ADRs]
Doc[Docs]
RB[Runbooks]
PT[Persona Configs]
WT[Workflow Traces]
Chat[Chat Archives]
end
subgraph Ingest[Ingestion Layer]
Chunker[DocumentChunker<br/>6 strategies]
Embed[EmbeddingProvider<br/>pluggable]
end
subgraph Storage[Storage Layer]
SI[SearchIndex]
PSI[PersonaSearchIndex]
ASI[ActionSearchIndex]
KG[KnowledgeGraph<br/>9 node/edge types]
end
subgraph Query[Query Layer]
Retrieve[SemanticRetriever]
RAG[RAGPipeline<br/>persona-aware]
KGQ[KG Query API]
end
subgraph Apps[Applications]
Persona[Persona Prompts]
Search[Search UI]
Onboard[Onboarding]
Audit[Audit Trail]
end
Sources --> Chunker
Chunker --> Embed
Embed --> SI
Embed --> PSI
Embed --> ASI
Sources --> KG
SI --> Retrieve
PSI --> Retrieve
Retrieve --> RAG
KG --> KGQ
RAG --> Persona
Retrieve --> Search
RAG --> Onboard
KG --> Audit
classDef source fill:#e1f5ff,stroke:#0277bd,stroke-width:2px;
classDef ingest fill:#fff3e0,stroke:#e65100,stroke-width:2px;
classDef storage fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px;
classDef query fill:#fce4ec,stroke:#880e4f,stroke-width:2px;
classDef app fill:#e8eaf6,stroke:#283593,stroke-width:2px;
class ADR,Doc,RB,PT,WT,Chat source;
class Chunker,Embed ingest;
class SI,PSI,ASI,KG storage;
class Retrieve,RAG,KGQ query;
class Persona,Search,Onboard,Audit app;
Building a Team Knowledge Base with FCC¶
Step 1: Inventory Your Knowledge Sources¶
Before indexing anything, list the knowledge your team generates and consumes:
| Source | Typical Volume | Update Frequency | Priority |
|---|---|---|---|
| ADRs | 20-200 | Weekly | High |
| Design docs | 50-500 | Daily | High |
| Runbooks | 10-50 | Monthly | High |
| Persona configs | 5-60 | Per change | Medium |
| Workflow traces | 100-10000+ | Per run | Medium |
| Meeting notes | 50-500 | Weekly | Low |
| Chat archives | 1000+ | Hourly | Low |
Start with the high-priority sources. Add others as you prove value.
Step 2: Pick a Chunking Strategy¶
The DocumentChunker offers six strategies:
| Strategy | When to Use | Chunk Size |
|---|---|---|
fixed_size |
Uniform short documents | 512 tokens |
sentence |
Natural prose, docs | Variable |
paragraph |
Long-form articles | Variable |
markdown_header |
Structured docs with headings | Variable |
code_aware |
Source code, snippets | Variable |
semantic |
Mixed content, highest quality | Variable |
Chunking strategy by source
- ADRs and design docs ->
markdown_header - Runbooks ->
paragraph - Code snippets ->
code_aware - Workflow traces ->
fixed_size(JSON structure is already segmented) - Chat archives ->
semantic
Step 3: Configure an Embedding Provider¶
FCC provides an EmbeddingProvider protocol. For local development and CI, use the MockEmbeddingProvider (384 dimensions, deterministic). For production, plug in a real provider:
from fcc.api import PersonaSearchIndex
from fcc.search.embeddings import EmbeddingProvider
class SentenceTransformerProvider(EmbeddingProvider):
def __init__(self, model_name="all-MiniLM-L6-v2"):
from sentence_transformers import SentenceTransformer
self.model = SentenceTransformer(model_name)
self.dimension = 384
def embed(self, texts: list[str]) -> list[list[float]]:
return self.model.encode(texts).tolist()
provider = SentenceTransformerProvider()
index = PersonaSearchIndex(embedding_provider=provider)
Provider stability
Switching providers invalidates your index. Standardize on one provider at the team level and version it. When you upgrade, rebuild the index end-to-end.
Step 4: Build Indices¶
Start with three indices that cover 80% of team queries:
from fcc.api import PersonaRegistry, PersonaSearchIndex, SearchIndex
from fcc.api import DocumentChunker
# 1. Persona index - "which persona handles X?"
registry = PersonaRegistry.load_default()
persona_index = PersonaSearchIndex(embedding_provider=provider)
persona_index.index_registry(registry)
# 2. Team docs index - "what did we decide about Y?"
chunker = DocumentChunker(strategy="markdown_header")
docs_index = SearchIndex(embedding_provider=provider)
for doc_path in team_docs_paths:
chunks = chunker.chunk_file(doc_path)
docs_index.add_documents(chunks, source=doc_path)
# 3. Action index - "how do we perform Z?"
from fcc.api import ActionSearchIndex
action_index = ActionSearchIndex(embedding_provider=provider)
action_index.index_from_registry(registry)
Step 5: Wrap in a RAG Pipeline¶
The RAGPipeline adds persona-aware retrieval to any prompt:
from fcc.api import RAGPipeline
rag = RAGPipeline(
search_index=docs_index,
persona_registry=registry,
top_k=5,
)
# Persona-aware query
result = rag.query(
question="What is our incident severity taxonomy?",
persona_id="IC", # bias retrieval toward IC-relevant docs
)
The pipeline automatically retrieves context relevant to the invoking persona, inserts it into the prompt, and returns both the answer and the source citations.
PersonaSearchIndex for Team Queries¶
The PersonaSearchIndex enables queries like:
- "Which persona should handle a protocol compliance question?"
- "What personas orchestrate research workflows?"
- "Show me personas with a security specialization"
results = persona_index.search(
query="protocol compliance validation",
top_k=3,
filters={"category": "protocol_engineering"}
)
for result in results:
print(f"{result.persona_id}: {result.score:.3f} - {result.role_title}")
Team-Specific Persona Queries¶
Extend the index with team-local metadata:
persona_index.add_metadata("BC", {
"owners": ["alice", "bob"],
"last_customized": "2026-03-15",
"team": "payments-alpha",
})
# Query by team
team_personas = persona_index.search(
query="design patterns",
filters={"team": "payments-alpha"}
)
This lets each team see its own persona customizations first, while preserving access to the shared registry.
Team Knowledge Graphs¶
The KnowledgeGraph captures explicit relationships:
| Node Types | Edge Types |
|---|---|
| Persona | champions, orchestrates, collaborates_with |
| Workflow | invokes, produces, consumes |
| Action | part_of, precedes, follows |
| Artifact | authored_by, reviewed_by, referenced_by |
| Decision | decided_by, affects, supersedes |
Building a Team KG¶
from fcc.api import KnowledgeGraph
from fcc.knowledge.builders import build_full_fcc_graph
# Start from the shipped FCC graph
kg = build_full_fcc_graph(registry)
# Add team-specific nodes and edges
kg.add_node("adr-042", type="Decision", label="Use PostgreSQL for events")
kg.add_edge("adr-042", "decided_by", "BC")
kg.add_edge("adr-042", "affects", "event-bus-workflow")
# Query: what decisions affect a workflow?
decisions = kg.query(
node_type="Decision",
edge="affects",
target="event-bus-workflow"
)
Serialization for Sharing¶
Export team KGs in standard formats for cross-team consumption:
# OWL for ontology tooling
kg.serialize_owl("team-kg.owl")
# JSON-LD for web interoperability
kg.serialize_jsonld("team-kg.jsonld")
# SKOS for concept mapping
kg.serialize_skos("team-kg.ttl")
RAG Patterns for Team Documentation¶
Pattern 1: Onboarding Q&A¶
onboarding_rag = RAGPipeline(
search_index=docs_index,
persona_registry=registry,
top_k=8,
include_sources=True,
)
new_hire_questions = [
"How do we deploy to staging?",
"What is our code review process?",
"Who owns the payments service?",
"What ADR covers our error handling approach?",
]
for q in new_hire_questions:
result = onboarding_rag.query(q, persona_id="DE")
print(f"Q: {q}\nA: {result.answer}\nSources: {result.sources}\n")
Pattern 2: Decision Archaeology¶
"Why did we decide X?" is the highest-value query for long-lived teams.
decision_rag = RAGPipeline(
search_index=adr_index, # ADRs only
persona_registry=registry,
top_k=3,
)
result = decision_rag.query(
"Why did we choose PostgreSQL over DynamoDB?",
persona_id="BC",
)
Pattern 3: Runbook Retrieval During Incidents¶
incident_rag = RAGPipeline(
search_index=runbook_index,
persona_registry=registry,
top_k=2, # focused, not exhaustive
max_tokens=1500, # short, actionable
)
result = incident_rag.query(
"How do I restart the payments worker?",
persona_id="IC",
)
RAG hygiene for runbooks
Runbook RAG answers must be short, actionable, and cite sources. Incident responders do not have time to read five paragraphs. Set max_tokens low and always include source URLs.
Federation Across Teams¶
When multiple teams have their own knowledge bases, the FederatedKnowledgeGraph resolves cross-team queries without flattening team namespaces.
from fcc.knowledge.federation import FederatedKnowledgeGraph
from fcc.api import NamespaceRegistry
# Register team namespaces
ns = NamespaceRegistry()
ns.register("alpha", "payments", "https://payments.team/")
ns.register("beta", "search", "https://search.team/")
# Federate each team's KG
fed_kg = FederatedKnowledgeGraph(namespaces=ns)
fed_kg.add_graph("alpha", alpha_kg)
fed_kg.add_graph("beta", beta_kg)
# Cross-team query
results = fed_kg.query(
node_type="Decision",
predicate="affects",
target="event-bus-workflow"
)
# Returns decisions from BOTH teams, with source namespace
Federation Architecture¶
flowchart TD
subgraph TeamA[Team Alpha - Payments]
KGA[Team Alpha KG]
DocA[Team Alpha Docs]
end
subgraph TeamB[Team Beta - Search]
KGB[Team Beta KG]
DocB[Team Beta Docs]
end
subgraph TeamC[Team Gamma - Analytics]
KGC[Team Gamma KG]
DocC[Team Gamma Docs]
end
subgraph Federation[Federation Layer]
NS[Namespace Registry]
Resolver[Entity Resolver]
FedKG[Federated KG]
end
subgraph Queries[Cross-Team Queries]
Q1[Who owns X?]
Q2[Which team decided Y?]
Q3[What ADRs touch service Z?]
end
TeamA --> Federation
TeamB --> Federation
TeamC --> Federation
Federation --> Queries
classDef team fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px;
classDef fed fill:#fff3e0,stroke:#e65100,stroke-width:2px;
classDef query fill:#e1f5ff,stroke:#0277bd,stroke-width:2px;
class KGA,KGB,KGC,DocA,DocB,DocC team;
class NS,Resolver,FedKG fed;
class Q1,Q2,Q3 query;
Privacy and Access Control¶
A team knowledge base concentrates sensitive information. Before enabling semantic search or RAG, address three privacy concerns.
Concern 1: PII in Indexed Content¶
Chat archives, incident post-mortems, and customer-facing docs often contain PII. Mitigate by:
- Running PII detection before indexing (names, emails, identifiers)
- Excluding source directories with known-sensitive content
- Redacting before chunking, not after retrieval
from fcc.api import DocumentChunker
chunker = DocumentChunker(
strategy="markdown_header",
preprocess=[redact_pii, strip_credentials], # custom filters
)
Concern 2: Cross-Team Leakage¶
A federated KG may surface a team's internal decisions to other teams. Mitigate by:
- Tagging nodes/edges with visibility levels (public, team, restricted)
- Filtering queries by caller's namespace
- Auditing cross-team queries via event bus
fed_kg.add_node(
"adr-042",
type="Decision",
visibility="team", # only Team Alpha can retrieve
owner_team="alpha",
)
Concern 3: Embedding Provider Data Exfiltration¶
If you use a hosted embedding provider (OpenAI, Cohere, etc.), every indexed document is sent to that provider. Mitigate by:
- Using local/self-hosted embedding models for sensitive content
- Pre-filtering text before sending to hosted providers
- Reviewing provider data-handling terms quarterly
Access control is not confidentiality
Access controls in your index layer prevent accidental leakage between authorized users. They do not protect against a compromised embedding provider or a breach of your vector store. Treat index contents as sensitive.
Operational Concerns¶
Index Freshness¶
Stale indices drive bad answers. Freshness strategies:
| Strategy | Freshness | Cost |
|---|---|---|
| Nightly rebuild | 24h lag | Low |
| Event-triggered reindex | Near-real-time | Medium |
| Hybrid (full nightly + event delta) | <15 min | Medium-high |
| Streaming updates | <1 min | High |
Most teams start with nightly rebuilds and add event-triggered delta updates within 3 months.
Index Size and Cost¶
Plan for growth. A team generating 20 ADRs, 100 design docs, and 500 workflow traces per quarter accumulates ~50K embedded chunks per year. Budget storage and query costs accordingly.
Monitoring¶
Instrument your index with FCC observability:
- Query latency p50/p95/p99
- Retrieval precision (spot-check 10 queries weekly)
- Index freshness gap
- Failed embedding requests
Next Steps¶
- Multi-Team Governance -- governance for federated KGs
- Team Scaling Guide -- when to add KG/RAG by stage
- RAG Pipeline Documentation -- deeper dive on the RAG module
- Knowledge Graph Documentation -- KG nodes, edges, serializers
- Federation Documentation -- cross-team entity resolution
- Semantic Search Prompts -- sample queries