RAG Pipeline¶
Duration: 75 minutes
Level: Advanced
Module: fcc.rag
This tutorial teaches you how to build retrieval-augmented generation (RAG) pipelines using the FCC framework. You will learn to chunk documents with 6 strategies, configure semantic retrieval, and run persona-aware queries that ground AI answers in your project's documentation.
Prerequisites¶
- Completed beginner/intermediate tutorials
- Familiarity with
PersonaRegistryand the semantic search module - Understanding of embeddings and similarity search concepts
Architecture Overview¶
The FCC RAG pipeline has three layers:
- DocumentChunker -- Splits documents into indexable chunks using configurable strategies
- SemanticRetriever -- Retrieves the most relevant chunks for a query using embedding similarity
- RAGPipeline -- Orchestrates retrieval and generation, with optional persona-aware prompting
All layers work with mock/deterministic implementations by default, requiring no API keys.
Document Chunking¶
The DocumentChunker splits text into chunks suitable for embedding and retrieval. Six strategies are available:
| Strategy | Enum | Description |
|---|---|---|
| Fixed Size | FIXED_SIZE |
Windows of chunk_size characters with configurable overlap |
| Paragraph | PARAGRAPH |
Split on double newlines (paragraph boundaries) |
| Semantic | SEMANTIC |
Paragraph-based splitting (default fallback) |
| YAML Block | YAML_BLOCK |
Split YAML by top-level keys |
| Code Function | CODE_FUNCTION |
Split Python source by def/class definitions |
| Parent-Child | PARENT_CHILD |
Hierarchical parent sections with child paragraphs |
Basic Chunking¶
from fcc.rag.chunking import DocumentChunker, ChunkingStrategy
# Create a chunker with paragraph strategy
chunker = DocumentChunker(
strategy=ChunkingStrategy.PARAGRAPH,
chunk_size=512,
overlap=64,
)
text = """
# Research Plan
This document outlines the research methodology for the FCC project.
## Objectives
The primary objective is to validate the Find-Create-Critique cycle
across multiple persona categories.
## Methodology
We will use a mixed-methods approach combining quantitative simulation
metrics with qualitative expert review.
"""
chunks = chunker.chunk_text(text, source_path="research_plan.md")
for chunk in chunks:
print(f"[{chunk.chunk_id[:8]}] ({chunk.strategy.value}) "
f"offset={chunk.start_offset}: {chunk.text[:60]}...")
Fixed-Size Chunking with Overlap¶
chunker_fixed = DocumentChunker(
strategy=ChunkingStrategy.FIXED_SIZE,
chunk_size=200,
overlap=50,
)
chunks = chunker_fixed.chunk_text(text, source_path="plan.md")
print(f"Fixed-size chunks: {len(chunks)}")
for chunk in chunks:
print(f" [{chunk.start_offset}:{chunk.end_offset}] "
f"({len(chunk.text)} chars)")
Specialized Chunking Methods¶
The chunker provides format-specific methods beyond the generic chunk_text:
# YAML chunking -- splits by top-level keys
yaml_text = """
personas:
RC:
name: Research Crafter
category: core
workflows:
base:
nodes: 5
"""
yaml_chunks = chunker.chunk_yaml(yaml_text, source_path="config.yaml")
for chunk in yaml_chunks:
print(f" YAML key: {chunk.metadata.get('key', 'N/A')}")
# Python source chunking -- splits by def/class
python_text = '''
import os
class PersonaLoader:
"""Loads personas from YAML."""
def load(self, path):
pass
def validate_spec(spec):
"""Validate a persona spec."""
return True
'''
py_chunks = chunker.chunk_python(python_text, source_path="loader.py")
for chunk in py_chunks:
print(f" Python {chunk.metadata.get('type', 'unknown')}: "
f"{chunk.text[:50]}...")
# Markdown chunking -- splits by headers
md_chunks = chunker.chunk_markdown(text, source_path="plan.md")
for chunk in md_chunks:
header = chunk.metadata.get("header", "preamble")
level = chunk.metadata.get("level", 0)
print(f" H{level}: {header}")
Parent-Child Chunking¶
The PARENT_CHILD strategy creates hierarchical chunks where parent sections contain child paragraphs. This enables context expansion during retrieval:
chunker_pc = DocumentChunker(strategy=ChunkingStrategy.PARENT_CHILD)
pc_chunks = chunker_pc.chunk_text(text, source_path="plan.md")
parents = [c for c in pc_chunks if c.metadata.get("type") == "parent"]
children = [c for c in pc_chunks if c.metadata.get("type") == "child"]
print(f"Parents: {len(parents)}, Children: {len(children)}")
for child in children:
print(f" Child -> Parent: {child.parent_chunk_id[:8] if child.parent_chunk_id else 'none'}")
Directory Chunking¶
Chunk all files in a directory, automatically selecting the right strategy based on file extension:
from pathlib import Path
all_chunks = chunker.chunk_directory(
Path("src/fcc"),
patterns=["*.py", "*.yaml", "*.yml", "*.md"],
)
print(f"Total chunks from directory: {len(all_chunks)}")
Semantic Retriever¶
The SemanticRetriever wraps a SearchIndex and maps search results back to DocumentChunk instances:
from fcc.rag.retriever import SemanticRetriever, RetrievalResult
from fcc.search.embeddings import MockEmbeddingProvider
# Build a retriever from chunks
retriever = SemanticRetriever.build_from_chunks(
chunks=all_chunks[:100], # Index 100 chunks
provider=MockEmbeddingProvider(),
)
# Retrieve relevant chunks
results = retriever.retrieve("persona validation workflow", k=5)
for rr in results:
print(f" [{rr.score:.3f}] {rr.chunk.source_path}: "
f"{rr.chunk.text[:80]}...")
Context Expansion¶
When using parent-child chunking, retrieve_with_context expands results with parent chunk text:
results_ctx = retriever.retrieve_with_context(
"research methodology", k=3
)
for rr in results_ctx:
print(f" Score: {rr.score:.3f}")
print(f" Chunk: {rr.chunk.text[:100]}...")
if rr.context:
print(f" Context: {rr.context[:100]}...")
RAG Pipeline¶
The RAGPipeline orchestrates retrieval and generation. It combines a SemanticRetriever with an AI client (real or mock) to answer questions grounded in retrieved chunks:
from fcc.rag.pipeline import RAGPipeline, RAGResult
pipeline = RAGPipeline(retriever=retriever)
# Simple query
result = pipeline.query(
"How does the FCC validation workflow work?",
k=5,
)
print(f"Question: {result.question}")
print(f"Answer: {result.answer[:200]}...")
print(f"Sources: {len(result.sources)}")
print(f"Model: {result.model}")
Persona-Aware Queries¶
Pass a persona_id to shape the answer style based on the persona's role:
# Query with persona context
result = pipeline.query(
"What are the key quality gates for documentation?",
persona_id="DE", # Documentation Evangelist
k=5,
)
print(f"Persona: {result.persona_id}")
print(f"Answer: {result.answer[:300]}...")
Query with Full Persona Object¶
For richer persona context, use query_with_persona which injects the persona's R.I.S.C.E.A.R. role into the system prompt:
from fcc.personas.registry import PersonaRegistry
registry = PersonaRegistry.from_yaml_directory("src/fcc/data/personas")
de_persona = registry.get("DE")
result = pipeline.query_with_persona(
"Explain the documentation quality scoring methodology",
persona=de_persona,
k=5,
)
print(f"Answer styled as {result.persona_id}: {result.answer[:300]}...")
Building an Index from a Directory¶
Use the convenience method to chunk and index an entire directory:
pipeline.build_index_from_directory("src/fcc/data/personas")
# Now query against the persona YAML files
result = pipeline.query("Which personas handle data governance?")
for source in result.sources:
print(f" Source: {source.chunk.source_path}")
End-to-End Pipeline Example¶
Here is a complete pipeline from raw documents to persona-aware answers:
from fcc.rag.chunking import DocumentChunker, ChunkingStrategy
from fcc.rag.retriever import SemanticRetriever
from fcc.rag.pipeline import RAGPipeline
from fcc.search.embeddings import MockEmbeddingProvider
from pathlib import Path
# 1. Chunk documents
chunker = DocumentChunker(strategy=ChunkingStrategy.PARAGRAPH)
chunks = chunker.chunk_directory(Path("src/fcc/data/personas"))
print(f"Step 1: {len(chunks)} chunks created")
# 2. Build retriever
provider = MockEmbeddingProvider()
retriever = SemanticRetriever.build_from_chunks(chunks, provider)
print(f"Step 2: Retriever built with {len(chunks)} indexed chunks")
# 3. Create pipeline
pipeline = RAGPipeline(retriever=retriever)
print("Step 3: Pipeline ready")
# 4. Query
result = pipeline.query(
"What persona handles ML model deployment?",
persona_id="MOS", # Model Ops Steward
k=3,
)
print(f"Step 4: Answer ({len(result.sources)} sources)")
print(f" {result.answer[:200]}...")
Summary¶
In this tutorial you learned how to:
- Chunk documents with 6 strategies (fixed, paragraph, semantic, YAML, code, parent-child)
- Use format-specific chunking for YAML, Python, and Markdown files
- Build a
SemanticRetrieverwith context expansion for parent-child chunks - Create a
RAGPipelinewith persona-aware queries - Run end-to-end pipelines from raw documents to grounded answers
Next Steps¶
- Semantic Search -- Deep dive into the embedding and search layer
- Knowledge Graphs -- Represent your ecosystem as a queryable graph
- Federation -- Federate RAG pipelines across projects