RAG Pipeline Demo¶

This demo walks through the FCC RAG (Retrieval-Augmented Generation) pipeline -- a document chunking, semantic retrieval, and persona-aware generation system that answers questions grounded in FCC framework documentation.

Table of Contents¶

Overview
Prerequisites
Step 1: Chunk Documents
Step 2: Index Chunks with the Retriever
Step 3: Retrieve Relevant Chunks
Step 4: Run a RAG Query
Step 5: Persona-Aware Queries
Step 6: Explore Chunking Strategies
Chunking Strategies Reference
Screenshots
Next Steps

Overview¶

The fcc.rag module provides a three-stage pipeline:

Chunking -- Split documents into manageable pieces using one of six strategies (fixed-size, paragraph, semantic, YAML block, code function, parent-child).
Retrieval -- Embed chunks and find the most relevant ones for a given query using semantic similarity.
Generation -- Build a persona-aware prompt from retrieved context and generate an answer using an AI client (or the built-in mock).

The pipeline works entirely in mock mode with no API keys -- the mock embedding provider produces 384-dimensional vectors, and the mock AI client returns concatenated source context as the answer.

Prerequisites¶

Python 3.10+ with FCC installed: pip install -e ".[dev]"
No API keys required -- mock mode is the default

Optional for production use:

sentence-transformers and numpy for real embeddings
Anthropic or OpenAI API key for real generation

Step 1: Chunk Documents¶

The DocumentChunker splits text into DocumentChunk instances. Each chunk carries a deterministic ID, source path, character offsets, and metadata.

from fcc.rag import DocumentChunker, ChunkingStrategy

chunker = DocumentChunker(
    strategy=ChunkingStrategy.PARAGRAPH,
    max_chunk_size=500,
)

text = """
# FCC Overview

The FCC (Find, Create, Critique) framework organizes agent collaboration
into three phases. Each phase involves specialized personas.

## Find Phase

The Find phase focuses on research and discovery. Personas like the
Research Coordinator (RC) and Literature Surveyor (LS) gather information
and identify constraints.

## Create Phase

The Create phase transforms findings into solutions. The Solution
Architect (SA) and Technical Writer (TW) produce deliverables.

## Critique Phase

The Critique phase applies quality gates and governance checks. The
Code Reviewer (CR) and Ethics Auditor (EA) refine the output.
"""

chunks = chunker.chunk(text, source_path="docs/overview.md")
print(f"Chunks produced: {len(chunks)}")
for chunk in chunks:
    print(f"  [{chunk.chunk_id[:8]}] {chunk.text[:60]}...")
    print(f"    Strategy: {chunk.strategy.value}")
    print(f"    Offsets: {chunk.start_offset}--{chunk.end_offset}")

Step 2: Index Chunks with the Retriever¶

The SemanticRetriever wraps an embedding provider and a search index to enable semantic similarity search over chunks.

from fcc.rag import SemanticRetriever
from fcc.search.embeddings import MockEmbeddingProvider

provider = MockEmbeddingProvider()
retriever = SemanticRetriever(embedding_provider=provider)

# Index the chunks
retriever.index_chunks(chunks)
print(f"Indexed chunks: {retriever.chunk_count}")

Step 3: Retrieve Relevant Chunks¶

Query the retriever to find chunks most relevant to a question.

results = retriever.retrieve("What happens in the Critique phase?", k=3)
print(f"Retrieved {len(results)} chunks:")
for result in results:
    print(f"  Score: {result.score:.4f}")
    print(f"  Source: {result.chunk.source_path}")
    print(f"  Text: {result.chunk.text[:80]}...")
    print()

Step 4: Run a RAG Query¶

The RAGPipeline combines retrieval and generation into a single call. In mock mode, it returns the retrieved context as the answer.

from fcc.rag import RAGPipeline

pipeline = RAGPipeline(retriever=retriever)

result = pipeline.query("What personas are involved in the Find phase?")
print(f"Question: {result.question}")
print(f"Answer: {result.answer[:200]}...")
print(f"Sources used: {len(result.sources)}")
print(f"Model: {result.model}")

Step 5: Persona-Aware Queries¶

Pass a persona_id to shape the answer style. The pipeline injects a system prompt instructing the AI to respond as that persona.

result = pipeline.query(
    question="How should we evaluate solution quality?",
    persona_id="CR",
    k=3,
)
print(f"Persona: {result.persona_id}")
print(f"Answer: {result.answer[:200]}...")

# Or use the convenience method
result = pipeline.query_with_persona(
    question="What governance gates apply to data pipelines?",
    persona_id="EA",
    k=5,
)
print(f"Persona: {result.persona_id}")
print(f"Sources: {len(result.sources)}")

Step 6: Explore Chunking Strategies¶

Try different strategies to see how they affect chunk boundaries and retrieval quality.

from fcc.rag import DocumentChunker, ChunkingStrategy

# Fixed-size windowing
fixed_chunker = DocumentChunker(
    strategy=ChunkingStrategy.FIXED_SIZE,
    max_chunk_size=200,
)
fixed_chunks = fixed_chunker.chunk(text, source_path="docs/overview.md")
print(f"Fixed-size chunks: {len(fixed_chunks)}")

# YAML block splitting (for YAML files)
yaml_chunker = DocumentChunker(strategy=ChunkingStrategy.YAML_BLOCK)

# Code function splitting (for Python files)
code_chunker = DocumentChunker(strategy=ChunkingStrategy.CODE_FUNCTION)

# Parent-child hierarchical chunking
parent_child_chunker = DocumentChunker(
    strategy=ChunkingStrategy.PARENT_CHILD,
    max_chunk_size=300,
)
pc_chunks = parent_child_chunker.chunk(text, source_path="docs/overview.md")
print(f"Parent-child chunks: {len(pc_chunks)}")
for chunk in pc_chunks:
    parent = f" (parent={chunk.parent_chunk_id[:8]})" if chunk.parent_chunk_id else ""
    print(f"  [{chunk.chunk_id[:8]}]{parent} {chunk.text[:50]}...")

Chunking Strategies Reference¶

Strategy	Enum Value	Best For	Description
Fixed Size	`FIXED_SIZE`	Uniform text	Splits at character boundaries with optional overlap
Paragraph	`PARAGRAPH`	Markdown, prose	Splits on blank lines between paragraphs
Semantic	`SEMANTIC`	Structured text	Uses heading markers to identify semantic boundaries
YAML Block	`YAML_BLOCK`	YAML config files	Splits on top-level YAML keys
Code Function	`CODE_FUNCTION`	Python source	Splits on `def` and `class` boundaries
Parent-Child	`PARENT_CHILD`	Hierarchical docs	Creates parent chunks with child sub-chunks

Screenshots¶

RAG Pipeline Streamlit app showing chunk visualization

The RAG Pipeline Streamlit app provides an interactive interface for uploading documents, selecting chunking strategies, running queries, and inspecting retrieved source chunks alongside generated answers.

Next Steps¶

Explore the Knowledge Graph Demo to see how structured graph data complements the RAG pipeline.
Try the Federation Demo to index documents from multiple projects into a federated retriever.
See src/fcc/rag/ for the full source code.
Review src/fcc/data/schemas/rag.json for the JSON schema.
Check notebooks/14_rag_pipeline.ipynb for a Jupyter walkthrough.