Skip to content

RAG Pipeline

Duration: 75 minutes Level: Advanced Module: fcc.rag

This tutorial teaches you how to build retrieval-augmented generation (RAG) pipelines using the FCC framework. You will learn to chunk documents with 6 strategies, configure semantic retrieval, and run persona-aware queries that ground AI answers in your project's documentation.

Prerequisites

  • Completed beginner/intermediate tutorials
  • Familiarity with PersonaRegistry and the semantic search module
  • Understanding of embeddings and similarity search concepts

Architecture Overview

The FCC RAG pipeline has three layers:

  1. DocumentChunker -- Splits documents into indexable chunks using configurable strategies
  2. SemanticRetriever -- Retrieves the most relevant chunks for a query using embedding similarity
  3. RAGPipeline -- Orchestrates retrieval and generation, with optional persona-aware prompting

All layers work with mock/deterministic implementations by default, requiring no API keys.

Document Chunking

The DocumentChunker splits text into chunks suitable for embedding and retrieval. Six strategies are available:

Strategy Enum Description
Fixed Size FIXED_SIZE Windows of chunk_size characters with configurable overlap
Paragraph PARAGRAPH Split on double newlines (paragraph boundaries)
Semantic SEMANTIC Paragraph-based splitting (default fallback)
YAML Block YAML_BLOCK Split YAML by top-level keys
Code Function CODE_FUNCTION Split Python source by def/class definitions
Parent-Child PARENT_CHILD Hierarchical parent sections with child paragraphs

Basic Chunking

from fcc.rag.chunking import DocumentChunker, ChunkingStrategy

# Create a chunker with paragraph strategy
chunker = DocumentChunker(
    strategy=ChunkingStrategy.PARAGRAPH,
    chunk_size=512,
    overlap=64,
)

text = """
# Research Plan

This document outlines the research methodology for the FCC project.

## Objectives

The primary objective is to validate the Find-Create-Critique cycle
across multiple persona categories.

## Methodology

We will use a mixed-methods approach combining quantitative simulation
metrics with qualitative expert review.
"""

chunks = chunker.chunk_text(text, source_path="research_plan.md")
for chunk in chunks:
    print(f"[{chunk.chunk_id[:8]}] ({chunk.strategy.value}) "
          f"offset={chunk.start_offset}: {chunk.text[:60]}...")

Fixed-Size Chunking with Overlap

chunker_fixed = DocumentChunker(
    strategy=ChunkingStrategy.FIXED_SIZE,
    chunk_size=200,
    overlap=50,
)

chunks = chunker_fixed.chunk_text(text, source_path="plan.md")
print(f"Fixed-size chunks: {len(chunks)}")
for chunk in chunks:
    print(f"  [{chunk.start_offset}:{chunk.end_offset}] "
          f"({len(chunk.text)} chars)")

Specialized Chunking Methods

The chunker provides format-specific methods beyond the generic chunk_text:

# YAML chunking -- splits by top-level keys
yaml_text = """
personas:
  RC:
    name: Research Crafter
    category: core
workflows:
  base:
    nodes: 5
"""
yaml_chunks = chunker.chunk_yaml(yaml_text, source_path="config.yaml")
for chunk in yaml_chunks:
    print(f"  YAML key: {chunk.metadata.get('key', 'N/A')}")

# Python source chunking -- splits by def/class
python_text = '''
import os

class PersonaLoader:
    """Loads personas from YAML."""
    def load(self, path):
        pass

def validate_spec(spec):
    """Validate a persona spec."""
    return True
'''
py_chunks = chunker.chunk_python(python_text, source_path="loader.py")
for chunk in py_chunks:
    print(f"  Python {chunk.metadata.get('type', 'unknown')}: "
          f"{chunk.text[:50]}...")

# Markdown chunking -- splits by headers
md_chunks = chunker.chunk_markdown(text, source_path="plan.md")
for chunk in md_chunks:
    header = chunk.metadata.get("header", "preamble")
    level = chunk.metadata.get("level", 0)
    print(f"  H{level}: {header}")

Parent-Child Chunking

The PARENT_CHILD strategy creates hierarchical chunks where parent sections contain child paragraphs. This enables context expansion during retrieval:

chunker_pc = DocumentChunker(strategy=ChunkingStrategy.PARENT_CHILD)
pc_chunks = chunker_pc.chunk_text(text, source_path="plan.md")

parents = [c for c in pc_chunks if c.metadata.get("type") == "parent"]
children = [c for c in pc_chunks if c.metadata.get("type") == "child"]
print(f"Parents: {len(parents)}, Children: {len(children)}")

for child in children:
    print(f"  Child -> Parent: {child.parent_chunk_id[:8] if child.parent_chunk_id else 'none'}")

Directory Chunking

Chunk all files in a directory, automatically selecting the right strategy based on file extension:

from pathlib import Path

all_chunks = chunker.chunk_directory(
    Path("src/fcc"),
    patterns=["*.py", "*.yaml", "*.yml", "*.md"],
)
print(f"Total chunks from directory: {len(all_chunks)}")

Semantic Retriever

The SemanticRetriever wraps a SearchIndex and maps search results back to DocumentChunk instances:

from fcc.rag.retriever import SemanticRetriever, RetrievalResult
from fcc.search.embeddings import MockEmbeddingProvider

# Build a retriever from chunks
retriever = SemanticRetriever.build_from_chunks(
    chunks=all_chunks[:100],  # Index 100 chunks
    provider=MockEmbeddingProvider(),
)

# Retrieve relevant chunks
results = retriever.retrieve("persona validation workflow", k=5)
for rr in results:
    print(f"  [{rr.score:.3f}] {rr.chunk.source_path}: "
          f"{rr.chunk.text[:80]}...")

Context Expansion

When using parent-child chunking, retrieve_with_context expands results with parent chunk text:

results_ctx = retriever.retrieve_with_context(
    "research methodology", k=3
)
for rr in results_ctx:
    print(f"  Score: {rr.score:.3f}")
    print(f"  Chunk: {rr.chunk.text[:100]}...")
    if rr.context:
        print(f"  Context: {rr.context[:100]}...")

RAG Pipeline

The RAGPipeline orchestrates retrieval and generation. It combines a SemanticRetriever with an AI client (real or mock) to answer questions grounded in retrieved chunks:

from fcc.rag.pipeline import RAGPipeline, RAGResult

pipeline = RAGPipeline(retriever=retriever)

# Simple query
result = pipeline.query(
    "How does the FCC validation workflow work?",
    k=5,
)
print(f"Question: {result.question}")
print(f"Answer: {result.answer[:200]}...")
print(f"Sources: {len(result.sources)}")
print(f"Model: {result.model}")

Persona-Aware Queries

Pass a persona_id to shape the answer style based on the persona's role:

# Query with persona context
result = pipeline.query(
    "What are the key quality gates for documentation?",
    persona_id="DE",  # Documentation Evangelist
    k=5,
)
print(f"Persona: {result.persona_id}")
print(f"Answer: {result.answer[:300]}...")

Query with Full Persona Object

For richer persona context, use query_with_persona which injects the persona's R.I.S.C.E.A.R. role into the system prompt:

from fcc.personas.registry import PersonaRegistry

registry = PersonaRegistry.from_yaml_directory("src/fcc/data/personas")
de_persona = registry.get("DE")

result = pipeline.query_with_persona(
    "Explain the documentation quality scoring methodology",
    persona=de_persona,
    k=5,
)
print(f"Answer styled as {result.persona_id}: {result.answer[:300]}...")

Building an Index from a Directory

Use the convenience method to chunk and index an entire directory:

pipeline.build_index_from_directory("src/fcc/data/personas")

# Now query against the persona YAML files
result = pipeline.query("Which personas handle data governance?")
for source in result.sources:
    print(f"  Source: {source.chunk.source_path}")

End-to-End Pipeline Example

Here is a complete pipeline from raw documents to persona-aware answers:

from fcc.rag.chunking import DocumentChunker, ChunkingStrategy
from fcc.rag.retriever import SemanticRetriever
from fcc.rag.pipeline import RAGPipeline
from fcc.search.embeddings import MockEmbeddingProvider
from pathlib import Path

# 1. Chunk documents
chunker = DocumentChunker(strategy=ChunkingStrategy.PARAGRAPH)
chunks = chunker.chunk_directory(Path("src/fcc/data/personas"))
print(f"Step 1: {len(chunks)} chunks created")

# 2. Build retriever
provider = MockEmbeddingProvider()
retriever = SemanticRetriever.build_from_chunks(chunks, provider)
print(f"Step 2: Retriever built with {len(chunks)} indexed chunks")

# 3. Create pipeline
pipeline = RAGPipeline(retriever=retriever)
print("Step 3: Pipeline ready")

# 4. Query
result = pipeline.query(
    "What persona handles ML model deployment?",
    persona_id="MOS",  # Model Ops Steward
    k=3,
)
print(f"Step 4: Answer ({len(result.sources)} sources)")
print(f"  {result.answer[:200]}...")

Summary

In this tutorial you learned how to:

  • Chunk documents with 6 strategies (fixed, paragraph, semantic, YAML, code, parent-child)
  • Use format-specific chunking for YAML, Python, and Markdown files
  • Build a SemanticRetriever with context expansion for parent-child chunks
  • Create a RAGPipeline with persona-aware queries
  • Run end-to-end pipelines from raw documents to grounded answers

Next Steps