Skip to content

RAG Pipeline Demo

This demo walks through the FCC RAG (Retrieval-Augmented Generation) pipeline -- a document chunking, semantic retrieval, and persona-aware generation system that answers questions grounded in FCC framework documentation.


Table of Contents

  1. Overview
  2. Prerequisites
  3. Step 1: Chunk Documents
  4. Step 2: Index Chunks with the Retriever
  5. Step 3: Retrieve Relevant Chunks
  6. Step 4: Run a RAG Query
  7. Step 5: Persona-Aware Queries
  8. Step 6: Explore Chunking Strategies
  9. Chunking Strategies Reference
  10. Screenshots
  11. Next Steps

Overview

The fcc.rag module provides a three-stage pipeline:

  1. Chunking -- Split documents into manageable pieces using one of six strategies (fixed-size, paragraph, semantic, YAML block, code function, parent-child).
  2. Retrieval -- Embed chunks and find the most relevant ones for a given query using semantic similarity.
  3. Generation -- Build a persona-aware prompt from retrieved context and generate an answer using an AI client (or the built-in mock).

The pipeline works entirely in mock mode with no API keys -- the mock embedding provider produces 384-dimensional vectors, and the mock AI client returns concatenated source context as the answer.


Prerequisites

  • Python 3.10+ with FCC installed: pip install -e ".[dev]"
  • No API keys required -- mock mode is the default

Optional for production use:

  • sentence-transformers and numpy for real embeddings
  • Anthropic or OpenAI API key for real generation

Step 1: Chunk Documents

The DocumentChunker splits text into DocumentChunk instances. Each chunk carries a deterministic ID, source path, character offsets, and metadata.

from fcc.rag import DocumentChunker, ChunkingStrategy

chunker = DocumentChunker(
    strategy=ChunkingStrategy.PARAGRAPH,
    max_chunk_size=500,
)

text = """
# FCC Overview

The FCC (Find, Create, Critique) framework organizes agent collaboration
into three phases. Each phase involves specialized personas.

## Find Phase

The Find phase focuses on research and discovery. Personas like the
Research Coordinator (RC) and Literature Surveyor (LS) gather information
and identify constraints.

## Create Phase

The Create phase transforms findings into solutions. The Solution
Architect (SA) and Technical Writer (TW) produce deliverables.

## Critique Phase

The Critique phase applies quality gates and governance checks. The
Code Reviewer (CR) and Ethics Auditor (EA) refine the output.
"""

chunks = chunker.chunk(text, source_path="docs/overview.md")
print(f"Chunks produced: {len(chunks)}")
for chunk in chunks:
    print(f"  [{chunk.chunk_id[:8]}] {chunk.text[:60]}...")
    print(f"    Strategy: {chunk.strategy.value}")
    print(f"    Offsets: {chunk.start_offset}--{chunk.end_offset}")

Step 2: Index Chunks with the Retriever

The SemanticRetriever wraps an embedding provider and a search index to enable semantic similarity search over chunks.

from fcc.rag import SemanticRetriever
from fcc.search.embeddings import MockEmbeddingProvider

provider = MockEmbeddingProvider()
retriever = SemanticRetriever(embedding_provider=provider)

# Index the chunks
retriever.index_chunks(chunks)
print(f"Indexed chunks: {retriever.chunk_count}")

Step 3: Retrieve Relevant Chunks

Query the retriever to find chunks most relevant to a question.

results = retriever.retrieve("What happens in the Critique phase?", k=3)
print(f"Retrieved {len(results)} chunks:")
for result in results:
    print(f"  Score: {result.score:.4f}")
    print(f"  Source: {result.chunk.source_path}")
    print(f"  Text: {result.chunk.text[:80]}...")
    print()

Step 4: Run a RAG Query

The RAGPipeline combines retrieval and generation into a single call. In mock mode, it returns the retrieved context as the answer.

from fcc.rag import RAGPipeline

pipeline = RAGPipeline(retriever=retriever)

result = pipeline.query("What personas are involved in the Find phase?")
print(f"Question: {result.question}")
print(f"Answer: {result.answer[:200]}...")
print(f"Sources used: {len(result.sources)}")
print(f"Model: {result.model}")

Step 5: Persona-Aware Queries

Pass a persona_id to shape the answer style. The pipeline injects a system prompt instructing the AI to respond as that persona.

result = pipeline.query(
    question="How should we evaluate solution quality?",
    persona_id="CR",
    k=3,
)
print(f"Persona: {result.persona_id}")
print(f"Answer: {result.answer[:200]}...")

# Or use the convenience method
result = pipeline.query_with_persona(
    question="What governance gates apply to data pipelines?",
    persona_id="EA",
    k=5,
)
print(f"Persona: {result.persona_id}")
print(f"Sources: {len(result.sources)}")

Step 6: Explore Chunking Strategies

Try different strategies to see how they affect chunk boundaries and retrieval quality.

from fcc.rag import DocumentChunker, ChunkingStrategy

# Fixed-size windowing
fixed_chunker = DocumentChunker(
    strategy=ChunkingStrategy.FIXED_SIZE,
    max_chunk_size=200,
)
fixed_chunks = fixed_chunker.chunk(text, source_path="docs/overview.md")
print(f"Fixed-size chunks: {len(fixed_chunks)}")

# YAML block splitting (for YAML files)
yaml_chunker = DocumentChunker(strategy=ChunkingStrategy.YAML_BLOCK)

# Code function splitting (for Python files)
code_chunker = DocumentChunker(strategy=ChunkingStrategy.CODE_FUNCTION)

# Parent-child hierarchical chunking
parent_child_chunker = DocumentChunker(
    strategy=ChunkingStrategy.PARENT_CHILD,
    max_chunk_size=300,
)
pc_chunks = parent_child_chunker.chunk(text, source_path="docs/overview.md")
print(f"Parent-child chunks: {len(pc_chunks)}")
for chunk in pc_chunks:
    parent = f" (parent={chunk.parent_chunk_id[:8]})" if chunk.parent_chunk_id else ""
    print(f"  [{chunk.chunk_id[:8]}]{parent} {chunk.text[:50]}...")

Chunking Strategies Reference

Strategy Enum Value Best For Description
Fixed Size FIXED_SIZE Uniform text Splits at character boundaries with optional overlap
Paragraph PARAGRAPH Markdown, prose Splits on blank lines between paragraphs
Semantic SEMANTIC Structured text Uses heading markers to identify semantic boundaries
YAML Block YAML_BLOCK YAML config files Splits on top-level YAML keys
Code Function CODE_FUNCTION Python source Splits on def and class boundaries
Parent-Child PARENT_CHILD Hierarchical docs Creates parent chunks with child sub-chunks

Screenshots

RAG Pipeline Streamlit app showing chunk visualization

The RAG Pipeline Streamlit app provides an interactive interface for uploading documents, selecting chunking strategies, running queries, and inspecting retrieved source chunks alongside generated answers.


Next Steps

  • Explore the Knowledge Graph Demo to see how structured graph data complements the RAG pipeline.
  • Try the Federation Demo to index documents from multiple projects into a federated retriever.
  • See src/fcc/rag/ for the full source code.
  • Review src/fcc/data/schemas/rag.json for the JSON schema.
  • Check notebooks/14_rag_pipeline.ipynb for a Jupyter walkthrough.