Literature Review Agents¶
How to use the FCC RAG pipeline and persona system to conduct systematic literature reviews with AI-assisted retrieval, synthesis, and critique.
Overview¶
A literature review using FCC follows the natural Find-Create-Critique cycle:
- FIND: Ingest and index papers using the RAG pipeline
- CREATE: Synthesize findings using persona-aware queries
- CRITIQUE: Evaluate coverage, identify gaps, and refine
Setting Up the Pipeline¶
Step 1: Document Ingestion¶
Use the DocumentChunker to prepare your corpus:
from fcc.rag.chunker import DocumentChunker
chunker = DocumentChunker(strategy="paragraph")
# Chunk a research paper
documents = [
{
"id": "smith2024",
"content": open("papers/smith2024.txt").read(),
"metadata": {"authors": "Smith et al.", "year": 2024, "topic": "NLP"},
},
{
"id": "jones2023",
"content": open("papers/jones2023.txt").read(),
"metadata": {"authors": "Jones et al.", "year": 2023, "topic": "NLP"},
},
]
all_chunks = []
for doc in documents:
chunks = chunker.chunk(doc["content"], metadata=doc["metadata"])
all_chunks.extend(chunks)
print(f"Total chunks: {len(all_chunks)}")
Step 2: Build the Search Index¶
Index chunks for semantic retrieval:
from fcc.rag.retriever import SemanticRetriever
from fcc.search.embeddings import MockEmbeddingProvider
# Use MockEmbeddingProvider for testing, or a real provider for production
embedder = MockEmbeddingProvider(dimensions=384)
retriever = SemanticRetriever(embedding_provider=embedder)
# Index all chunks
retriever.index(all_chunks)
print(f"Indexed {len(all_chunks)} chunks")
Step 3: Persona-Aware Queries¶
Use the RAG pipeline with persona context for targeted retrieval:
from fcc.rag.pipeline import RAGPipeline
pipeline = RAGPipeline(chunker=chunker, retriever=retriever)
# Query as a Research Catalyst -- broad exploration
results = pipeline.query(
query="What are the key advances in transformer architectures?",
persona_id="research_catalyst",
top_k=10,
)
for r in results:
print(f" [{r.score:.3f}] {r.chunk.metadata.get('authors', 'Unknown')}: "
f"{r.chunk.content[:80]}...")
Step 4: Synthesis with Different Personas¶
Use different personas for different aspects of the review:
# Domain Expert -- focus on methodology
methods_results = pipeline.query(
query="What experimental methodologies are used?",
persona_id="domain_expert",
top_k=5,
)
# Competitive Intelligence Analyst -- focus on comparisons
comparison_results = pipeline.query(
query="How do different approaches compare in performance?",
persona_id="competitive_intelligence_analyst",
top_k=5,
)
Chunking Strategy Guide¶
Choose the right chunking strategy for your corpus:
| Strategy | Best For | Chunk Size |
|---|---|---|
fixed |
Uniform-length documents | Fixed chars |
sentence |
Short papers, abstracts | 1-3 sentences |
paragraph |
Well-structured papers | Paragraphs |
semantic |
Dense technical content | Semantic units |
recursive |
Long documents with sections | Hierarchical |
custom |
Domain-specific formats | User-defined |
Building a Review Knowledge Graph¶
Capture relationships between papers, methods, and findings:
from fcc.knowledge.graph import KnowledgeGraph
kg = KnowledgeGraph()
# Add paper nodes
for doc in documents:
kg.add_node(doc["id"], node_type="DOCUMENT", metadata=doc["metadata"])
# Add method nodes and relationships
kg.add_node("transformer", node_type="METHOD", metadata={"name": "Transformer"})
kg.add_edge("smith2024", "transformer", edge_type="USES")
kg.add_edge("jones2023", "transformer", edge_type="USES")
# Query: which papers use the same method?
related = kg.nodes_connected_to("transformer")
print(f"Papers using Transformer: {[n.id for n in related]}")
Workflow for Systematic Reviews¶
1. Define research question
2. Collect papers (manual or automated)
3. Ingest into RAG pipeline (paragraph chunking)
4. Build knowledge graph (papers, methods, findings)
5. Run persona-aware queries (Research Catalyst)
6. Synthesize findings (Build Champion)
7. Identify gaps (Domain Expert critique)
8. Iterate until coverage is sufficient
9. Export KG to RDF for supplementary materials
Related Resources¶
- FAIR Workflow -- FAIR compliance for research data
- Reproducibility Guide -- Reproducible research pipelines
- Research Methodology -- FCC as research instrument
- Notebook
17_rag_pipeline.ipynb-- Full RAG pipeline tutorial - SAMPLE_PROMPTS.md -- Prompts A2, A9 for RAG examples