RAG Pipeline Demo¶
This demo walks through the FCC RAG (Retrieval-Augmented Generation) pipeline -- a document chunking, semantic retrieval, and persona-aware generation system that answers questions grounded in FCC framework documentation.
Table of Contents¶
- Overview
- Prerequisites
- Step 1: Chunk Documents
- Step 2: Index Chunks with the Retriever
- Step 3: Retrieve Relevant Chunks
- Step 4: Run a RAG Query
- Step 5: Persona-Aware Queries
- Step 6: Explore Chunking Strategies
- Chunking Strategies Reference
- Screenshots
- Next Steps
Overview¶
The fcc.rag module provides a three-stage pipeline:
- Chunking -- Split documents into manageable pieces using one of six strategies (fixed-size, paragraph, semantic, YAML block, code function, parent-child).
- Retrieval -- Embed chunks and find the most relevant ones for a given query using semantic similarity.
- Generation -- Build a persona-aware prompt from retrieved context and generate an answer using an AI client (or the built-in mock).
The pipeline works entirely in mock mode with no API keys -- the mock embedding provider produces 384-dimensional vectors, and the mock AI client returns concatenated source context as the answer.
Prerequisites¶
- Python 3.10+ with FCC installed:
pip install -e ".[dev]" - No API keys required -- mock mode is the default
Optional for production use:
sentence-transformersandnumpyfor real embeddings- Anthropic or OpenAI API key for real generation
Step 1: Chunk Documents¶
The DocumentChunker splits text into DocumentChunk instances. Each chunk
carries a deterministic ID, source path, character offsets, and metadata.
from fcc.rag import DocumentChunker, ChunkingStrategy
chunker = DocumentChunker(
strategy=ChunkingStrategy.PARAGRAPH,
max_chunk_size=500,
)
text = """
# FCC Overview
The FCC (Find, Create, Critique) framework organizes agent collaboration
into three phases. Each phase involves specialized personas.
## Find Phase
The Find phase focuses on research and discovery. Personas like the
Research Coordinator (RC) and Literature Surveyor (LS) gather information
and identify constraints.
## Create Phase
The Create phase transforms findings into solutions. The Solution
Architect (SA) and Technical Writer (TW) produce deliverables.
## Critique Phase
The Critique phase applies quality gates and governance checks. The
Code Reviewer (CR) and Ethics Auditor (EA) refine the output.
"""
chunks = chunker.chunk(text, source_path="docs/overview.md")
print(f"Chunks produced: {len(chunks)}")
for chunk in chunks:
print(f" [{chunk.chunk_id[:8]}] {chunk.text[:60]}...")
print(f" Strategy: {chunk.strategy.value}")
print(f" Offsets: {chunk.start_offset}--{chunk.end_offset}")
Step 2: Index Chunks with the Retriever¶
The SemanticRetriever wraps an embedding provider and a search index to
enable semantic similarity search over chunks.
from fcc.rag import SemanticRetriever
from fcc.search.embeddings import MockEmbeddingProvider
provider = MockEmbeddingProvider()
retriever = SemanticRetriever(embedding_provider=provider)
# Index the chunks
retriever.index_chunks(chunks)
print(f"Indexed chunks: {retriever.chunk_count}")
Step 3: Retrieve Relevant Chunks¶
Query the retriever to find chunks most relevant to a question.
results = retriever.retrieve("What happens in the Critique phase?", k=3)
print(f"Retrieved {len(results)} chunks:")
for result in results:
print(f" Score: {result.score:.4f}")
print(f" Source: {result.chunk.source_path}")
print(f" Text: {result.chunk.text[:80]}...")
print()
Step 4: Run a RAG Query¶
The RAGPipeline combines retrieval and generation into a single call. In
mock mode, it returns the retrieved context as the answer.
from fcc.rag import RAGPipeline
pipeline = RAGPipeline(retriever=retriever)
result = pipeline.query("What personas are involved in the Find phase?")
print(f"Question: {result.question}")
print(f"Answer: {result.answer[:200]}...")
print(f"Sources used: {len(result.sources)}")
print(f"Model: {result.model}")
Step 5: Persona-Aware Queries¶
Pass a persona_id to shape the answer style. The pipeline injects a system
prompt instructing the AI to respond as that persona.
result = pipeline.query(
question="How should we evaluate solution quality?",
persona_id="CR",
k=3,
)
print(f"Persona: {result.persona_id}")
print(f"Answer: {result.answer[:200]}...")
# Or use the convenience method
result = pipeline.query_with_persona(
question="What governance gates apply to data pipelines?",
persona_id="EA",
k=5,
)
print(f"Persona: {result.persona_id}")
print(f"Sources: {len(result.sources)}")
Step 6: Explore Chunking Strategies¶
Try different strategies to see how they affect chunk boundaries and retrieval quality.
from fcc.rag import DocumentChunker, ChunkingStrategy
# Fixed-size windowing
fixed_chunker = DocumentChunker(
strategy=ChunkingStrategy.FIXED_SIZE,
max_chunk_size=200,
)
fixed_chunks = fixed_chunker.chunk(text, source_path="docs/overview.md")
print(f"Fixed-size chunks: {len(fixed_chunks)}")
# YAML block splitting (for YAML files)
yaml_chunker = DocumentChunker(strategy=ChunkingStrategy.YAML_BLOCK)
# Code function splitting (for Python files)
code_chunker = DocumentChunker(strategy=ChunkingStrategy.CODE_FUNCTION)
# Parent-child hierarchical chunking
parent_child_chunker = DocumentChunker(
strategy=ChunkingStrategy.PARENT_CHILD,
max_chunk_size=300,
)
pc_chunks = parent_child_chunker.chunk(text, source_path="docs/overview.md")
print(f"Parent-child chunks: {len(pc_chunks)}")
for chunk in pc_chunks:
parent = f" (parent={chunk.parent_chunk_id[:8]})" if chunk.parent_chunk_id else ""
print(f" [{chunk.chunk_id[:8]}]{parent} {chunk.text[:50]}...")
Chunking Strategies Reference¶
| Strategy | Enum Value | Best For | Description |
|---|---|---|---|
| Fixed Size | FIXED_SIZE |
Uniform text | Splits at character boundaries with optional overlap |
| Paragraph | PARAGRAPH |
Markdown, prose | Splits on blank lines between paragraphs |
| Semantic | SEMANTIC |
Structured text | Uses heading markers to identify semantic boundaries |
| YAML Block | YAML_BLOCK |
YAML config files | Splits on top-level YAML keys |
| Code Function | CODE_FUNCTION |
Python source | Splits on def and class boundaries |
| Parent-Child | PARENT_CHILD |
Hierarchical docs | Creates parent chunks with child sub-chunks |
Screenshots¶

The RAG Pipeline Streamlit app provides an interactive interface for uploading documents, selecting chunking strategies, running queries, and inspecting retrieved source chunks alongside generated answers.
Next Steps¶
- Explore the Knowledge Graph Demo to see how structured graph data complements the RAG pipeline.
- Try the Federation Demo to index documents from multiple projects into a federated retriever.
- See
src/fcc/rag/for the full source code. - Review
src/fcc/data/schemas/rag.jsonfor the JSON schema. - Check
notebooks/14_rag_pipeline.ipynbfor a Jupyter walkthrough.