Literature Review Agents¶

How to use the FCC RAG pipeline and persona system to conduct systematic literature reviews with AI-assisted retrieval, synthesis, and critique.

Overview¶

A literature review using FCC follows the natural Find-Create-Critique cycle:

FIND: Ingest and index papers using the RAG pipeline
CREATE: Synthesize findings using persona-aware queries
CRITIQUE: Evaluate coverage, identify gaps, and refine

Setting Up the Pipeline¶

Step 1: Document Ingestion¶

Use the DocumentChunker to prepare your corpus:

from fcc.rag.chunker import DocumentChunker

chunker = DocumentChunker(strategy="paragraph")

# Chunk a research paper
documents = [
    {
        "id": "smith2024",
        "content": open("papers/smith2024.txt").read(),
        "metadata": {"authors": "Smith et al.", "year": 2024, "topic": "NLP"},
    },
    {
        "id": "jones2023",
        "content": open("papers/jones2023.txt").read(),
        "metadata": {"authors": "Jones et al.", "year": 2023, "topic": "NLP"},
    },
]

all_chunks = []
for doc in documents:
    chunks = chunker.chunk(doc["content"], metadata=doc["metadata"])
    all_chunks.extend(chunks)

print(f"Total chunks: {len(all_chunks)}")

Step 2: Build the Search Index¶

Index chunks for semantic retrieval:

from fcc.rag.retriever import SemanticRetriever
from fcc.search.embeddings import MockEmbeddingProvider

# Use MockEmbeddingProvider for testing, or a real provider for production
embedder = MockEmbeddingProvider(dimensions=384)
retriever = SemanticRetriever(embedding_provider=embedder)

# Index all chunks
retriever.index(all_chunks)
print(f"Indexed {len(all_chunks)} chunks")

Step 3: Persona-Aware Queries¶

Use the RAG pipeline with persona context for targeted retrieval:

from fcc.rag.pipeline import RAGPipeline

pipeline = RAGPipeline(chunker=chunker, retriever=retriever)

# Query as a Research Catalyst -- broad exploration
results = pipeline.query(
    query="What are the key advances in transformer architectures?",
    persona_id="research_catalyst",
    top_k=10,
)

for r in results:
    print(f"  [{r.score:.3f}] {r.chunk.metadata.get('authors', 'Unknown')}: "
          f"{r.chunk.content[:80]}...")

Step 4: Synthesis with Different Personas¶

Use different personas for different aspects of the review:

# Domain Expert -- focus on methodology
methods_results = pipeline.query(
    query="What experimental methodologies are used?",
    persona_id="domain_expert",
    top_k=5,
)

# Competitive Intelligence Analyst -- focus on comparisons
comparison_results = pipeline.query(
    query="How do different approaches compare in performance?",
    persona_id="competitive_intelligence_analyst",
    top_k=5,
)

Chunking Strategy Guide¶

Choose the right chunking strategy for your corpus:

Strategy	Best For	Chunk Size
`fixed`	Uniform-length documents	Fixed chars
`sentence`	Short papers, abstracts	1-3 sentences
`paragraph`	Well-structured papers	Paragraphs
`semantic`	Dense technical content	Semantic units
`recursive`	Long documents with sections	Hierarchical
`custom`	Domain-specific formats	User-defined

Building a Review Knowledge Graph¶

Capture relationships between papers, methods, and findings:

from fcc.knowledge.graph import KnowledgeGraph

kg = KnowledgeGraph()

# Add paper nodes
for doc in documents:
    kg.add_node(doc["id"], node_type="DOCUMENT", metadata=doc["metadata"])

# Add method nodes and relationships
kg.add_node("transformer", node_type="METHOD", metadata={"name": "Transformer"})
kg.add_edge("smith2024", "transformer", edge_type="USES")
kg.add_edge("jones2023", "transformer", edge_type="USES")

# Query: which papers use the same method?
related = kg.nodes_connected_to("transformer")
print(f"Papers using Transformer: {[n.id for n in related]}")

Workflow for Systematic Reviews¶

1. Define research question
2. Collect papers (manual or automated)
3. Ingest into RAG pipeline (paragraph chunking)
4. Build knowledge graph (papers, methods, findings)
5. Run persona-aware queries (Research Catalyst)
6. Synthesize findings (Build Champion)
7. Identify gaps (Domain Expert critique)
8. Iterate until coverage is sufficient
9. Export KG to RDF for supplementary materials

FAIR Workflow -- FAIR compliance for research data
Reproducibility Guide -- Reproducible research pipelines
Research Methodology -- FCC as research instrument
Notebook 17_rag_pipeline.ipynb -- Full RAG pipeline tutorial
SAMPLE_PROMPTS.md -- Prompts A2, A9 for RAG examples