Chapter 3: RAG Pipelines¶

Learning Objectives¶

By the end of this chapter you will be able to:

Explain the Retrieval-Augmented Generation pattern and why it matters for FCC.
Design a chunking strategy for FCC artifacts.
Build a retrieval pipeline that combines semantic search with metadata filtering.
Integrate RAG into FCC simulation to ground persona outputs in prior artifacts.
Evaluate RAG quality using FCC's scoring engine.

The figure below shows the end-to-end RAG pipeline: chunker, embedding provider, search index, semantic retriever, persona-aware RAG stage, and grounded answer with source citations.

flowchart LR
    DOCS[Documents] --> CH[DocumentChunker]
    CH -->|Fixed / Paragraph / Code / Semantic| CHUNKS[Chunks]
    CHUNKS --> EMB[EmbeddingProvider]
    EMB --> IDX[(SearchIndex)]

    Q[User Query] --> EMB2[EmbeddingProvider]
    EMB2 --> RET[SemanticRetriever]
    IDX --> RET
    RET -->|Top-K chunks| RAG[RAGPipeline]
    PERSONA[Persona Context<br/>R.I.S.C.E.A.R.] -.-> RAG
    RAG --> ANS[Grounded Answer<br/>+ Source Citations]

    style IDX fill:#2196F3,color:#fff
    style RAG fill:#9C27B0,color:#fff
    style ANS fill:#4CAF50,color:#fff

Persona context is a first-class input rather than a post-hoc filter, so two personas querying the same corpus can produce different grounded answers without re-indexing.

What Is RAG?¶

Retrieval-Augmented Generation (RAG) is a pattern that combines information retrieval with language model generation. Instead of relying solely on the model's training data, RAG retrieves relevant documents from a corpus and includes them in the prompt. The model generates its response grounded in the retrieved documents, reducing hallucination and improving factual accuracy.

For FCC, RAG is transformative. Consider a Research Analyst persona in the Find phase. Without RAG, the persona relies entirely on the LLM's training data -- which may be outdated, incomplete, or irrelevant to your specific domain. With RAG, the persona retrieves findings from previous simulations, domain-specific documents, and organizational knowledge, then grounds its analysis in that retrieved context.

RAG turns FCC from a stateless workflow engine into a knowledge-grounded one. Each simulation builds on the accumulated knowledge of all previous simulations.

The RAG Pipeline¶

A RAG pipeline has four stages:

Stage 1: Chunking¶

Raw artifacts are split into chunks -- self-contained text segments that are small enough to embed efficiently but large enough to retain meaning. Chunking strategy depends on the artifact type:

Artifact Type	Chunking Strategy	Typical Chunk Size
Research findings	By section heading	500--1000 tokens
Design documents	By subsection	500--800 tokens
Code files	By function/class	Varies
Reviews	By criterion	200--500 tokens
Meeting notes	By topic	300--600 tokens

from fcc.knowledge.chunking import Chunker, ChunkingStrategy

chunker = Chunker(strategy=ChunkingStrategy.BY_SECTION)

chunks = chunker.chunk(artifact_text, metadata={
    "source": "artifact_001",
    "persona": "research_analyst",
    "session": "session_42",
})

for chunk in chunks:
    print(f"Chunk {chunk.id}: {len(chunk.text)} chars, "
          f"section: {chunk.metadata.get('section', 'N/A')}")

Stage 2: Embedding¶

Each chunk is embedded into vector space using the embedding provider from Chapter 1:

from fcc.search.index import SearchIndex
from fcc.search.providers import MockEmbeddingProvider

provider = MockEmbeddingProvider(dimensions=128)
index = SearchIndex(provider=provider)

for chunk in chunks:
    index.add(chunk.id, text=chunk.text, metadata=chunk.metadata)

Stage 3: Retrieval¶

When a persona needs context, the pipeline retrieves the most relevant chunks:

query = "What are the key competitive risks in the SaaS market?"

results = index.search(
    query,
    top_k=5,
    filters={"persona": "research_analyst"},
)

retrieved_context = "\n\n".join([
    f"[Source: {r.metadata['source']}]\n{r.text}"
    for r in results
])

Stage 4: Generation¶

The retrieved context is injected into the persona's prompt:

prompt = f"""You are the {persona.name}.
Role: {persona.spec.role}
Style: {persona.spec.style}

## Retrieved Context

The following excerpts from prior analyses are relevant to your task:

{retrieved_context}

## Your Task

{task_description}

## Constraints

- Ground your analysis in the retrieved context.
- Cite sources using [Source: ID] notation.
- Flag any claims not supported by the retrieved context.
"""

The persona now generates its output grounded in actual prior findings, not just the LLM's generic knowledge.

Chunking Strategies in Depth¶

The choice of chunking strategy significantly affects RAG quality. Here are the trade-offs:

Fixed-Size Chunking¶

Split text every N tokens, regardless of structure. Simple to implement but often breaks semantic units:

chunker = Chunker(strategy=ChunkingStrategy.FIXED_SIZE, chunk_size=500)

Pros: Predictable chunk sizes, simple implementation. Cons: Breaks sentences and paragraphs, loses structural context.

Section-Based Chunking¶

Split at section headings (Markdown #, ##, etc.). Preserves document structure:

chunker = Chunker(strategy=ChunkingStrategy.BY_SECTION)

Pros: Preserves semantic units, natural topic boundaries. Cons: Uneven chunk sizes, requires structured input.

Semantic Chunking¶

Split at natural semantic boundaries using sentence-level embeddings. Consecutive sentences with high similarity stay together; a drop in similarity triggers a chunk break:

chunker = Chunker(
    strategy=ChunkingStrategy.SEMANTIC,
    similarity_threshold=0.7,
    provider=embedding_provider,
)

Pros: Best semantic coherence, adapts to content structure. Cons: Requires embedding computation, slower.

Recursive Chunking¶

Try to split at paragraph boundaries. If a paragraph exceeds the max chunk size, split at sentence boundaries. If a sentence exceeds, split at word boundaries:

chunker = Chunker(
    strategy=ChunkingStrategy.RECURSIVE,
    max_chunk_size=800,
)

Pros: Good balance of coherence and size control. Cons: Moderate complexity.

For FCC, section-based chunking is the default recommendation because FCC artifacts (findings, designs, reviews) are typically structured with headings. Fall back to recursive chunking for unstructured content.

Retrieval Strategies¶

Basic Retrieval¶

Top-K nearest neighbors by embedding similarity. Fast and simple:

results = index.search(query, top_k=5)

Filtered Retrieval¶

Add metadata filters to narrow the search space:

results = index.search(query, top_k=5, filters={
    "persona": "research_analyst",
    "session": "session_42",
})

Hybrid Retrieval¶

Combine semantic search with keyword matching for better precision. Semantic search finds thematically relevant chunks; keyword matching ensures specific terms are present:

from fcc.knowledge.retrieval import HybridRetriever

retriever = HybridRetriever(index=index)
results = retriever.search(
    query=query,
    top_k=5,
    required_keywords=["SaaS", "pricing"],
    semantic_weight=0.7,
    keyword_weight=0.3,
)

Re-ranking¶

After initial retrieval, re-rank results using a cross-encoder model that scores each (query, chunk) pair more precisely than embedding similarity:

from fcc.knowledge.retrieval import CrossEncoderReranker

reranker = CrossEncoderReranker(model="cross-encoder/ms-marco-MiniLM-L-6-v2")
reranked = reranker.rerank(query, results, top_k=3)

Re-ranking improves precision but adds latency. Use it when precision matters more than speed.

Integrating RAG into FCC Simulation¶

The simulation engine supports RAG integration through the ContextProvider interface:

from fcc.simulation.engine import SimulationEngine

engine = SimulationEngine(
    mode="ai",
    context_provider=rag_pipeline,
)

result = engine.run(scenario)

When a context_provider is configured, the engine automatically retrieves relevant context before generating each node's prompt. The retrieval query is derived from the node's input context and the persona's R.I.S.C.E.A.R. input specification.

This integration is transparent to the persona. From the persona's perspective, the retrieved context is just part of its input. The persona does not need to know that RAG is active -- it simply receives richer, more relevant context.

Evaluating RAG Quality¶

RAG quality is evaluated along three dimensions:

Retrieval Quality¶

Are the retrieved chunks relevant to the query? Measured by precision@k and recall@k against a labeled relevance set:

from fcc.knowledge.evaluation import RetrievalEvaluator

evaluator = RetrievalEvaluator(index=index)
metrics = evaluator.evaluate(
    queries=test_queries,
    relevance_labels=test_labels,
    top_k=5,
)
print(f"Precision@5: {metrics.precision:.3f}")
print(f"Recall@5: {metrics.recall:.3f}")

Grounding Quality¶

Is the generated output grounded in the retrieved context? Measured by checking whether claims in the output can be traced to specific chunks:

from fcc.knowledge.evaluation import GroundingEvaluator

evaluator = GroundingEvaluator()
score = evaluator.evaluate(
    output=persona_output,
    context=retrieved_context,
)
print(f"Grounding score: {score:.3f}")

End-to-End Quality¶

Does the RAG-augmented output score higher than the non-RAG output on FCC quality gates? Run the same scenario with and without RAG and compare:

engine_no_rag = SimulationEngine(mode="ai")
engine_with_rag = SimulationEngine(mode="ai", context_provider=rag_pipeline)

result_no_rag = engine_no_rag.run(scenario)
result_with_rag = engine_with_rag.run(scenario)

print(f"Without RAG: {result_no_rag.trace.gates_passed}/{result_no_rag.trace.gates_total}")
print(f"With RAG: {result_with_rag.trace.gates_passed}/{result_with_rag.trace.gates_total}")

Key Takeaways¶

RAG grounds persona outputs in retrieved context, reducing hallucination and improving accuracy.
The pipeline has four stages: chunking, embedding, retrieval, and generation.
Section-based chunking is the default for FCC's structured artifacts; recursive chunking is the fallback.
Hybrid retrieval (semantic + keyword) and re-ranking improve precision.
RAG integrates transparently with the simulation engine via the ContextProvider interface.
Evaluate RAG along three dimensions: retrieval quality, grounding quality, and end-to-end quality.

Cross-References¶

Chapter 4: Federated Knowledge -- RAG across project boundaries
Chapter 1: Semantic Search -- the search layer that RAG builds on
FCC Guidebook, Chapter 17 -- knowledge federation reference
See Notebook 15 for hands-on RAG pipeline construction and evaluation

← Chapter 2: Knowledge Graphs | Next: Chapter 4 -- Federated Knowledge →