Skip to content

Literature Review Agents

How to use the FCC RAG pipeline and persona system to conduct systematic literature reviews with AI-assisted retrieval, synthesis, and critique.

Overview

A literature review using FCC follows the natural Find-Create-Critique cycle:

  1. FIND: Ingest and index papers using the RAG pipeline
  2. CREATE: Synthesize findings using persona-aware queries
  3. CRITIQUE: Evaluate coverage, identify gaps, and refine

Setting Up the Pipeline

Step 1: Document Ingestion

Use the DocumentChunker to prepare your corpus:

from fcc.rag.chunker import DocumentChunker

chunker = DocumentChunker(strategy="paragraph")

# Chunk a research paper
documents = [
    {
        "id": "smith2024",
        "content": open("papers/smith2024.txt").read(),
        "metadata": {"authors": "Smith et al.", "year": 2024, "topic": "NLP"},
    },
    {
        "id": "jones2023",
        "content": open("papers/jones2023.txt").read(),
        "metadata": {"authors": "Jones et al.", "year": 2023, "topic": "NLP"},
    },
]

all_chunks = []
for doc in documents:
    chunks = chunker.chunk(doc["content"], metadata=doc["metadata"])
    all_chunks.extend(chunks)

print(f"Total chunks: {len(all_chunks)}")

Step 2: Build the Search Index

Index chunks for semantic retrieval:

from fcc.rag.retriever import SemanticRetriever
from fcc.search.embeddings import MockEmbeddingProvider

# Use MockEmbeddingProvider for testing, or a real provider for production
embedder = MockEmbeddingProvider(dimensions=384)
retriever = SemanticRetriever(embedding_provider=embedder)

# Index all chunks
retriever.index(all_chunks)
print(f"Indexed {len(all_chunks)} chunks")

Step 3: Persona-Aware Queries

Use the RAG pipeline with persona context for targeted retrieval:

from fcc.rag.pipeline import RAGPipeline

pipeline = RAGPipeline(chunker=chunker, retriever=retriever)

# Query as a Research Catalyst -- broad exploration
results = pipeline.query(
    query="What are the key advances in transformer architectures?",
    persona_id="research_catalyst",
    top_k=10,
)

for r in results:
    print(f"  [{r.score:.3f}] {r.chunk.metadata.get('authors', 'Unknown')}: "
          f"{r.chunk.content[:80]}...")

Step 4: Synthesis with Different Personas

Use different personas for different aspects of the review:

# Domain Expert -- focus on methodology
methods_results = pipeline.query(
    query="What experimental methodologies are used?",
    persona_id="domain_expert",
    top_k=5,
)

# Competitive Intelligence Analyst -- focus on comparisons
comparison_results = pipeline.query(
    query="How do different approaches compare in performance?",
    persona_id="competitive_intelligence_analyst",
    top_k=5,
)

Chunking Strategy Guide

Choose the right chunking strategy for your corpus:

Strategy Best For Chunk Size
fixed Uniform-length documents Fixed chars
sentence Short papers, abstracts 1-3 sentences
paragraph Well-structured papers Paragraphs
semantic Dense technical content Semantic units
recursive Long documents with sections Hierarchical
custom Domain-specific formats User-defined

Building a Review Knowledge Graph

Capture relationships between papers, methods, and findings:

from fcc.knowledge.graph import KnowledgeGraph

kg = KnowledgeGraph()

# Add paper nodes
for doc in documents:
    kg.add_node(doc["id"], node_type="DOCUMENT", metadata=doc["metadata"])

# Add method nodes and relationships
kg.add_node("transformer", node_type="METHOD", metadata={"name": "Transformer"})
kg.add_edge("smith2024", "transformer", edge_type="USES")
kg.add_edge("jones2023", "transformer", edge_type="USES")

# Query: which papers use the same method?
related = kg.nodes_connected_to("transformer")
print(f"Papers using Transformer: {[n.id for n in related]}")

Workflow for Systematic Reviews

1. Define research question
2. Collect papers (manual or automated)
3. Ingest into RAG pipeline (paragraph chunking)
4. Build knowledge graph (papers, methods, findings)
5. Run persona-aware queries (Research Catalyst)
6. Synthesize findings (Build Champion)
7. Identify gaps (Domain Expert critique)
8. Iterate until coverage is sufficient
9. Export KG to RDF for supplementary materials