Skip to content

Semantic Search Prompts

Ten prompts for building embedding indexes, searching personas by description, finding similar actions, and configuring the FCC semantic search module. Each prompt includes a code snippet and a description of the expected output.


Table of Contents

  1. Build a Persona Search Index
  2. Search Personas by Description
  3. Build an Action Search Index
  4. Find Similar Actions
  5. Search with Category Filter
  6. Compare Embedding Providers
  7. Batch Index All Personas
  8. Search by Archetype
  9. Find Personas by Skill Match
  10. Cross-Index Search

1. Build a Persona Search Index

Create a searchable index of all FCC personas using embeddings.

from fcc.search.persona_index import PersonaSearchIndex
from fcc.search.embeddings import MockEmbeddingProvider
from fcc.personas.registry import PersonaRegistry
from fcc._resources import get_personas_dir

provider = MockEmbeddingProvider(dimension=384)
registry = PersonaRegistry.from_yaml_directory(get_personas_dir())

index = PersonaSearchIndex(embedding_provider=provider)
index.build(registry)
print(f"Indexed {len(index)} personas")

Expected output: A PersonaSearchIndex containing embeddings for all 102 personas, searchable by natural language descriptions. The index uses each persona's role, archetype, and expected output fields to generate embeddings.


2. Search Personas by Description

Find personas that match a natural language description of what you need.

results = index.search("I need someone who can validate data quality and governance compliance", top_k=5)
for result in results:
    print(f"{result.persona_id:6s} | {result.name:35s} | Score: {result.score:.3f}")

Expected output: The top 5 personas most semantically similar to the query. Expected results include DGS (Data Governance Specialist), GCA (Governance Compliance Auditor), QGD (Quality Gate Designer), and other governance-related personas.


3. Build an Action Search Index

Index all 312 workflow actions for semantic search.

from fcc.search.action_index import ActionSearchIndex
from fcc.search.embeddings import MockEmbeddingProvider
from fcc.workflow.actions import WorkflowActionRegistry

provider = MockEmbeddingProvider(dimension=384)
action_registry = WorkflowActionRegistry.load_default()

index = ActionSearchIndex(embedding_provider=provider)
index.build(action_registry)
print(f"Indexed {len(index)} actions across {len(action_registry.action_types)} types")

Expected output: An ActionSearchIndex containing embeddings for all 312 actions across 6 action types (scaffold, refactor, debug, test, compare, document). Each action is indexed by its description and associated persona.


4. Find Similar Actions

Search for actions that match a task description.

results = index.search("refactor a data pipeline to improve performance", top_k=5)
for result in results:
    print(f"{result.action_type:10s} | {result.persona_id:6s} | {result.description[:50]}...")

Expected output: The top 5 actions most relevant to pipeline refactoring, likely including refactor actions from POR (Pipeline Orchestrator), SQC (SQL Query Crafter), and TAL (ETL Architect).


5. Search with Category Filter

Restrict persona search to a specific category.

results = index.search(
    "model evaluation and scoring",
    top_k=5,
    filter_category="ml_lifecycle"
)
for result in results:
    print(f"{result.persona_id:6s} | {result.name:35s} | Category: {result.category}")

Expected output: Only personas from the ml_lifecycle category are returned. Expected results include ESC (Experiment Scientist), IAN (Impact Analyst), and IRE (Interpretability Researcher).


6. Compare Embedding Providers

Switch between embedding providers to evaluate search quality.

from fcc.search.embeddings import MockEmbeddingProvider
from fcc.search.persona_index import PersonaSearchIndex

# Mock provider (deterministic, 384-dim)
mock_provider = MockEmbeddingProvider(dimension=384)
mock_index = PersonaSearchIndex(embedding_provider=mock_provider)
mock_index.build(registry)

# Search with mock
mock_results = mock_index.search("data governance")

# With a real provider (requires sentence-transformers):
# from fcc.search.embeddings import SentenceTransformerProvider
# real_provider = SentenceTransformerProvider(model_name="all-MiniLM-L6-v2")
# real_index = PersonaSearchIndex(embedding_provider=real_provider)
# real_index.build(registry)
# real_results = real_index.search("data governance")

print("Mock results:", [r.persona_id for r in mock_results[:5]])

Expected output: Different providers produce different ranking orders. The MockEmbeddingProvider is deterministic and useful for testing, while SentenceTransformerProvider produces semantically meaningful embeddings for production use.


7. Batch Index All Personas

Index all personas with progress tracking.

from fcc.search.persona_index import PersonaSearchIndex
from fcc.search.embeddings import MockEmbeddingProvider
from fcc.personas.registry import PersonaRegistry
from fcc._resources import get_personas_dir

provider = MockEmbeddingProvider(dimension=384)
registry = PersonaRegistry.from_yaml_directory(get_personas_dir())

index = PersonaSearchIndex(embedding_provider=provider)
categories = registry.categories()
for category in sorted(categories):
    personas = registry.by_category(category)
    print(f"Indexing {category}: {len(personas)} personas")
    for persona in personas:
        index.add(persona)
print(f"Total indexed: {len(index)}")

Expected output: Progress output showing each of the 20 core categories being indexed, with persona counts per category. Total indexed count should be 102 core personas (or 147 when also indexing the 6 vertical packs) plus any plugin personas.


8. Search by Archetype

Find all personas that share a similar archetype (e.g., "The Investigator").

results = index.search("investigator who researches and gathers information", top_k=10)
investigators = [r for r in results if r.score > 0.7]
for r in investigators:
    persona = registry.get(r.persona_id)
    print(f"{r.persona_id:6s} | {persona.riscear.archetype:30s} | Score: {r.score:.3f}")

Expected output: Personas with investigator-like archetypes ranked by similarity. Expected high-scoring matches include RC (The Investigator), DSS (The Data Hunter), RIC (The Curator), and FDS (FAIR Data Steward).


9. Find Personas by Skill Match

Search for personas whose skills match a set of required capabilities.

skill_query = "Python, SQL, data pipeline orchestration, ETL, Airflow"
results = index.search(skill_query, top_k=5)
for result in results:
    persona = registry.get(result.persona_id)
    skills = persona.riscear.role_skills or []
    print(f"{result.persona_id:6s} | Skills: {', '.join(skills[:3])}...")

Expected output: Personas whose skill profiles best match the queried capabilities. Expected matches include POR (Pipeline Orchestrator), SQC (SQL Query Crafter), TAL (ETL Architect), and ASC (Automation Script Crafter).


Search across both persona and action indexes to find the best persona-action pair for a task.

from fcc.search.persona_index import PersonaSearchIndex
from fcc.search.action_index import ActionSearchIndex

task = "validate API schemas against OpenAPI specifications"

persona_results = persona_index.search(task, top_k=3)
action_results = action_index.search(task, top_k=3)

print("Best personas for this task:")
for r in persona_results:
    print(f"  {r.persona_id}: {r.name} (score: {r.score:.3f})")

print("\nBest actions for this task:")
for r in action_results:
    print(f"  {r.action_type}/{r.persona_id}: {r.description[:50]}... (score: {r.score:.3f})")

Expected output: A combined view showing which personas are best suited for the task and which specific actions they can perform. This enables task-to-persona-to-action mapping for workflow automation.