Skip to content

Chapter 3: RAG Pipelines

Learning Objectives

By the end of this chapter you will be able to:

  1. Explain the Retrieval-Augmented Generation pattern and why it matters for FCC.
  2. Design a chunking strategy for FCC artifacts.
  3. Build a retrieval pipeline that combines semantic search with metadata filtering.
  4. Integrate RAG into FCC simulation to ground persona outputs in prior artifacts.
  5. Evaluate RAG quality using FCC's scoring engine.

The figure below shows the end-to-end RAG pipeline: chunker, embedding provider, search index, semantic retriever, persona-aware RAG stage, and grounded answer with source citations.

flowchart LR
    DOCS[Documents] --> CH[DocumentChunker]
    CH -->|Fixed / Paragraph / Code / Semantic| CHUNKS[Chunks]
    CHUNKS --> EMB[EmbeddingProvider]
    EMB --> IDX[(SearchIndex)]

    Q[User Query] --> EMB2[EmbeddingProvider]
    EMB2 --> RET[SemanticRetriever]
    IDX --> RET
    RET -->|Top-K chunks| RAG[RAGPipeline]
    PERSONA[Persona Context<br/>R.I.S.C.E.A.R.] -.-> RAG
    RAG --> ANS[Grounded Answer<br/>+ Source Citations]

    style IDX fill:#2196F3,color:#fff
    style RAG fill:#9C27B0,color:#fff
    style ANS fill:#4CAF50,color:#fff

Persona context is a first-class input rather than a post-hoc filter, so two personas querying the same corpus can produce different grounded answers without re-indexing.

What Is RAG?

Retrieval-Augmented Generation (RAG) is a pattern that combines information retrieval with language model generation. Instead of relying solely on the model's training data, RAG retrieves relevant documents from a corpus and includes them in the prompt. The model generates its response grounded in the retrieved documents, reducing hallucination and improving factual accuracy.

For FCC, RAG is transformative. Consider a Research Analyst persona in the Find phase. Without RAG, the persona relies entirely on the LLM's training data -- which may be outdated, incomplete, or irrelevant to your specific domain. With RAG, the persona retrieves findings from previous simulations, domain-specific documents, and organizational knowledge, then grounds its analysis in that retrieved context.

RAG turns FCC from a stateless workflow engine into a knowledge-grounded one. Each simulation builds on the accumulated knowledge of all previous simulations.

The RAG Pipeline

A RAG pipeline has four stages:

Stage 1: Chunking

Raw artifacts are split into chunks -- self-contained text segments that are small enough to embed efficiently but large enough to retain meaning. Chunking strategy depends on the artifact type:

Artifact Type Chunking Strategy Typical Chunk Size
Research findings By section heading 500--1000 tokens
Design documents By subsection 500--800 tokens
Code files By function/class Varies
Reviews By criterion 200--500 tokens
Meeting notes By topic 300--600 tokens
from fcc.knowledge.chunking import Chunker, ChunkingStrategy

chunker = Chunker(strategy=ChunkingStrategy.BY_SECTION)

chunks = chunker.chunk(artifact_text, metadata={
    "source": "artifact_001",
    "persona": "research_analyst",
    "session": "session_42",
})

for chunk in chunks:
    print(f"Chunk {chunk.id}: {len(chunk.text)} chars, "
          f"section: {chunk.metadata.get('section', 'N/A')}")

Stage 2: Embedding

Each chunk is embedded into vector space using the embedding provider from Chapter 1:

from fcc.search.index import SearchIndex
from fcc.search.providers import MockEmbeddingProvider

provider = MockEmbeddingProvider(dimensions=128)
index = SearchIndex(provider=provider)

for chunk in chunks:
    index.add(chunk.id, text=chunk.text, metadata=chunk.metadata)

Stage 3: Retrieval

When a persona needs context, the pipeline retrieves the most relevant chunks:

query = "What are the key competitive risks in the SaaS market?"

results = index.search(
    query,
    top_k=5,
    filters={"persona": "research_analyst"},
)

retrieved_context = "\n\n".join([
    f"[Source: {r.metadata['source']}]\n{r.text}"
    for r in results
])

Stage 4: Generation

The retrieved context is injected into the persona's prompt:

prompt = f"""You are the {persona.name}.
Role: {persona.spec.role}
Style: {persona.spec.style}

## Retrieved Context

The following excerpts from prior analyses are relevant to your task:

{retrieved_context}

## Your Task

{task_description}

## Constraints

- Ground your analysis in the retrieved context.
- Cite sources using [Source: ID] notation.
- Flag any claims not supported by the retrieved context.
"""

The persona now generates its output grounded in actual prior findings, not just the LLM's generic knowledge.

Chunking Strategies in Depth

The choice of chunking strategy significantly affects RAG quality. Here are the trade-offs:

Fixed-Size Chunking

Split text every N tokens, regardless of structure. Simple to implement but often breaks semantic units:

chunker = Chunker(strategy=ChunkingStrategy.FIXED_SIZE, chunk_size=500)

Pros: Predictable chunk sizes, simple implementation. Cons: Breaks sentences and paragraphs, loses structural context.

Section-Based Chunking

Split at section headings (Markdown #, ##, etc.). Preserves document structure:

chunker = Chunker(strategy=ChunkingStrategy.BY_SECTION)

Pros: Preserves semantic units, natural topic boundaries. Cons: Uneven chunk sizes, requires structured input.

Semantic Chunking

Split at natural semantic boundaries using sentence-level embeddings. Consecutive sentences with high similarity stay together; a drop in similarity triggers a chunk break:

chunker = Chunker(
    strategy=ChunkingStrategy.SEMANTIC,
    similarity_threshold=0.7,
    provider=embedding_provider,
)

Pros: Best semantic coherence, adapts to content structure. Cons: Requires embedding computation, slower.

Recursive Chunking

Try to split at paragraph boundaries. If a paragraph exceeds the max chunk size, split at sentence boundaries. If a sentence exceeds, split at word boundaries:

chunker = Chunker(
    strategy=ChunkingStrategy.RECURSIVE,
    max_chunk_size=800,
)

Pros: Good balance of coherence and size control. Cons: Moderate complexity.

For FCC, section-based chunking is the default recommendation because FCC artifacts (findings, designs, reviews) are typically structured with headings. Fall back to recursive chunking for unstructured content.

Retrieval Strategies

Basic Retrieval

Top-K nearest neighbors by embedding similarity. Fast and simple:

results = index.search(query, top_k=5)

Filtered Retrieval

Add metadata filters to narrow the search space:

results = index.search(query, top_k=5, filters={
    "persona": "research_analyst",
    "session": "session_42",
})

Hybrid Retrieval

Combine semantic search with keyword matching for better precision. Semantic search finds thematically relevant chunks; keyword matching ensures specific terms are present:

from fcc.knowledge.retrieval import HybridRetriever

retriever = HybridRetriever(index=index)
results = retriever.search(
    query=query,
    top_k=5,
    required_keywords=["SaaS", "pricing"],
    semantic_weight=0.7,
    keyword_weight=0.3,
)

Re-ranking

After initial retrieval, re-rank results using a cross-encoder model that scores each (query, chunk) pair more precisely than embedding similarity:

from fcc.knowledge.retrieval import CrossEncoderReranker

reranker = CrossEncoderReranker(model="cross-encoder/ms-marco-MiniLM-L-6-v2")
reranked = reranker.rerank(query, results, top_k=3)

Re-ranking improves precision but adds latency. Use it when precision matters more than speed.

Integrating RAG into FCC Simulation

The simulation engine supports RAG integration through the ContextProvider interface:

from fcc.simulation.engine import SimulationEngine

engine = SimulationEngine(
    mode="ai",
    context_provider=rag_pipeline,
)

result = engine.run(scenario)

When a context_provider is configured, the engine automatically retrieves relevant context before generating each node's prompt. The retrieval query is derived from the node's input context and the persona's R.I.S.C.E.A.R. input specification.

This integration is transparent to the persona. From the persona's perspective, the retrieved context is just part of its input. The persona does not need to know that RAG is active -- it simply receives richer, more relevant context.

Evaluating RAG Quality

RAG quality is evaluated along three dimensions:

Retrieval Quality

Are the retrieved chunks relevant to the query? Measured by precision@k and recall@k against a labeled relevance set:

from fcc.knowledge.evaluation import RetrievalEvaluator

evaluator = RetrievalEvaluator(index=index)
metrics = evaluator.evaluate(
    queries=test_queries,
    relevance_labels=test_labels,
    top_k=5,
)
print(f"Precision@5: {metrics.precision:.3f}")
print(f"Recall@5: {metrics.recall:.3f}")

Grounding Quality

Is the generated output grounded in the retrieved context? Measured by checking whether claims in the output can be traced to specific chunks:

from fcc.knowledge.evaluation import GroundingEvaluator

evaluator = GroundingEvaluator()
score = evaluator.evaluate(
    output=persona_output,
    context=retrieved_context,
)
print(f"Grounding score: {score:.3f}")

End-to-End Quality

Does the RAG-augmented output score higher than the non-RAG output on FCC quality gates? Run the same scenario with and without RAG and compare:

engine_no_rag = SimulationEngine(mode="ai")
engine_with_rag = SimulationEngine(mode="ai", context_provider=rag_pipeline)

result_no_rag = engine_no_rag.run(scenario)
result_with_rag = engine_with_rag.run(scenario)

print(f"Without RAG: {result_no_rag.trace.gates_passed}/{result_no_rag.trace.gates_total}")
print(f"With RAG: {result_with_rag.trace.gates_passed}/{result_with_rag.trace.gates_total}")

Key Takeaways

  • RAG grounds persona outputs in retrieved context, reducing hallucination and improving accuracy.
  • The pipeline has four stages: chunking, embedding, retrieval, and generation.
  • Section-based chunking is the default for FCC's structured artifacts; recursive chunking is the fallback.
  • Hybrid retrieval (semantic + keyword) and re-ranking improve precision.
  • RAG integrates transparently with the simulation engine via the ContextProvider interface.
  • Evaluate RAG along three dimensions: retrieval quality, grounding quality, and end-to-end quality.

Cross-References


← Chapter 2: Knowledge Graphs | Next: Chapter 4 -- Federated Knowledge →