Skip to content

API Reference: Semantic Search

This document covers the FCC semantic search subsystem, which provides embedding-based document search with pluggable backends, persona-specific search, and action-specific search.

flowchart LR
    Q[Query Text] --> EP[EmbeddingProvider]
    EP -->|embed| V[Query Vector]
    V --> SI[SearchIndex]
    SI -->|cosine similarity| B[(InMemory or Numpy Backend)]
    B --> SR[SearchResult list]
    SR -->|ranked by score| Out[Top-K Results]

EmbeddingProvider Protocol

fcc.search.embeddings.EmbeddingProvider is a runtime_checkable Protocol defining the contract for all embedding providers.

Required Methods

Method Signature Description
embed (text: str) -> tuple[float, ...] Embed a single text into a vector
embed_batch (texts: list[str]) -> list[tuple[float, ...]] Embed multiple texts
dimension () -> int Return the vector dimensionality

Any class implementing these three methods satisfies the protocol via structural subtyping.


MockEmbeddingProvider

fcc.search.embeddings.MockEmbeddingProvider produces deterministic 384-dimension vectors using MD5 hashing, matching the dimensionality of all-MiniLM-L6-v2. Suitable for testing without external dependencies.

from fcc.search.embeddings import MockEmbeddingProvider

provider = MockEmbeddingProvider()
vec = provider.embed("research coordinator")
print(len(vec))           # 384
print(provider.dimension())  # 384

# Batch embedding
vecs = provider.embed_batch(["text 1", "text 2", "text 3"])
print(len(vecs))  # 3

SentenceTransformerProvider

fcc.search.embeddings.SentenceTransformerProvider wraps the sentence-transformers library for production-grade embeddings. The model is lazy-loaded on first use.

from fcc.search.embeddings import SentenceTransformerProvider, st_available

if st_available():
    provider = SentenceTransformerProvider(model_name="all-MiniLM-L6-v2")
    vec = provider.embed("research coordinator")
    print(provider.dimension())  # 384

Raises ImportError at construction time if sentence-transformers is not installed.


SearchIndex

fcc.search.index.SearchIndex combines an EmbeddingProvider with a SearchBackend into a high-level search API with persistence support and an embedding cache.

Adding Documents

from fcc.search.index import SearchIndex

index = SearchIndex()  # Uses MockEmbeddingProvider + InMemoryBackend

index.add_document("doc1", "Research methodology overview", {"type": "guide"})
index.add_document("doc2", "Code review best practices", {"type": "guide"})
index.add_document("doc3", "API design patterns", {"type": "reference"})

print(index.count())  # 3

Searching

results = index.search("research methods", k=5)
for r in results:
    print(f"{r.doc_id}: {r.score:.3f} - {r.text[:50]}")

Each SearchResult has doc_id, text, score, and metadata fields.

Remove and Check

index.remove("doc3")           # Returns True if found
index.contains("doc3")         # False

Persistence

from pathlib import Path

# Save to JSON
index.save(Path("/tmp/my_index.json"))

# Load from JSON
restored = SearchIndex.load(Path("/tmp/my_index.json"))

Backends

Two backend implementations are available:

Backend Import Description
InMemoryBackend fcc.search.index Pure-Python dict-based storage with cosine similarity
NumpyBackend fcc.search.index Numpy-accelerated batch cosine similarity (requires numpy)
from fcc.search.index import SearchIndex, NumpyBackend

index = SearchIndex(backend=NumpyBackend())

PersonaSearchIndex

fcc.search.persona_search.PersonaSearchIndex enables natural-language discovery of personas by searching over concatenated role, archetype, and responsibility text.

from fcc.search.persona_search import PersonaSearchIndex
from fcc.personas.registry import PersonaRegistry
from fcc._resources import get_personas_dir

registry = PersonaRegistry.from_yaml_directory(get_personas_dir())
psi = PersonaSearchIndex.from_registry(registry)

# Search by natural language
results = psi.search_personas("security and compliance auditing", k=5)
for r in results:
    print(f"{r.doc_id}: {r.score:.3f} ({r.metadata.get('name')})")

# Find similar personas
similar = psi.similar_personas("RC", k=3)

How It Works

Each persona is indexed with: - doc_id: The persona's ID (e.g., "RC") - text: Concatenated riscear.role + riscear.archetype + riscear.responsibilities - metadata: category, name, persona_id


ActionSearchIndex

fcc.search.action_search.ActionSearchIndex enables natural-language discovery of workflow actions by searching over description and execution steps.

from fcc.search.action_search import ActionSearchIndex
from fcc.workflow.actions import WorkflowActionRegistry
from fcc._resources import get_actions_dir

action_registry = WorkflowActionRegistry.from_yaml_directory(get_actions_dir())
asi = ActionSearchIndex.from_registry(action_registry)

# Search by natural language
results = asi.search_actions("generate test coverage report", k=5)
for r in results:
    print(f"{r.doc_id}: {r.score:.3f}")

How It Works

Each action is indexed with: - doc_id: "{persona_id}:{action_type}" (e.g., "RC:scaffold") - text: Concatenated description + execution_steps - metadata: persona_id, action_type


Import Paths Summary

Class Import Path
EmbeddingProvider fcc.search.embeddings
MockEmbeddingProvider fcc.search.embeddings
SentenceTransformerProvider fcc.search.embeddings
SearchIndex fcc.search.index
InMemoryBackend fcc.search.index
NumpyBackend fcc.search.index
PersonaSearchIndex fcc.search.persona_search
ActionSearchIndex fcc.search.action_search
SearchResult fcc.search.models

See Also