API Reference: Semantic Search¶

This document covers the FCC semantic search subsystem, which provides embedding-based document search with pluggable backends, persona-specific search, and action-specific search.

flowchart LR
    Q[Query Text] --> EP[EmbeddingProvider]
    EP -->|embed| V[Query Vector]
    V --> SI[SearchIndex]
    SI -->|cosine similarity| B[(InMemory or Numpy Backend)]
    B --> SR[SearchResult list]
    SR -->|ranked by score| Out[Top-K Results]

EmbeddingProvider Protocol¶

fcc.search.embeddings.EmbeddingProvider is a runtime_checkable Protocol defining the contract for all embedding providers.

Required Methods¶

Method	Signature	Description
`embed`	`(text: str) -> tuple[float, ...]`	Embed a single text into a vector
`embed_batch`	`(texts: list[str]) -> list[tuple[float, ...]]`	Embed multiple texts
`dimension`	`() -> int`	Return the vector dimensionality

Any class implementing these three methods satisfies the protocol via structural subtyping.

MockEmbeddingProvider¶

fcc.search.embeddings.MockEmbeddingProvider produces deterministic 384-dimension vectors using MD5 hashing, matching the dimensionality of all-MiniLM-L6-v2. Suitable for testing without external dependencies.

from fcc.search.embeddings import MockEmbeddingProvider

provider = MockEmbeddingProvider()
vec = provider.embed("research coordinator")
print(len(vec))           # 384
print(provider.dimension())  # 384

# Batch embedding
vecs = provider.embed_batch(["text 1", "text 2", "text 3"])
print(len(vecs))  # 3

SentenceTransformerProvider¶

fcc.search.embeddings.SentenceTransformerProvider wraps the sentence-transformers library for production-grade embeddings. The model is lazy-loaded on first use.

from fcc.search.embeddings import SentenceTransformerProvider, st_available

if st_available():
    provider = SentenceTransformerProvider(model_name="all-MiniLM-L6-v2")
    vec = provider.embed("research coordinator")
    print(provider.dimension())  # 384

Raises ImportError at construction time if sentence-transformers is not installed.

SearchIndex¶

fcc.search.index.SearchIndex combines an EmbeddingProvider with a SearchBackend into a high-level search API with persistence support and an embedding cache.

Adding Documents¶

from fcc.search.index import SearchIndex

index = SearchIndex()  # Uses MockEmbeddingProvider + InMemoryBackend

index.add_document("doc1", "Research methodology overview", {"type": "guide"})
index.add_document("doc2", "Code review best practices", {"type": "guide"})
index.add_document("doc3", "API design patterns", {"type": "reference"})

print(index.count())  # 3

Searching¶

results = index.search("research methods", k=5)
for r in results:
    print(f"{r.doc_id}: {r.score:.3f} - {r.text[:50]}")

Each SearchResult has doc_id, text, score, and metadata fields.

Remove and Check¶

index.remove("doc3")           # Returns True if found
index.contains("doc3")         # False

Persistence¶

from pathlib import Path

# Save to JSON
index.save(Path("/tmp/my_index.json"))

# Load from JSON
restored = SearchIndex.load(Path("/tmp/my_index.json"))

Backends¶

Two backend implementations are available:

Backend	Import	Description
`InMemoryBackend`	`fcc.search.index`	Pure-Python dict-based storage with cosine similarity
`NumpyBackend`	`fcc.search.index`	Numpy-accelerated batch cosine similarity (requires `numpy`)

from fcc.search.index import SearchIndex, NumpyBackend

index = SearchIndex(backend=NumpyBackend())

PersonaSearchIndex¶

fcc.search.persona_search.PersonaSearchIndex enables natural-language discovery of personas by searching over concatenated role, archetype, and responsibility text.

from fcc.search.persona_search import PersonaSearchIndex
from fcc.personas.registry import PersonaRegistry
from fcc._resources import get_personas_dir

registry = PersonaRegistry.from_yaml_directory(get_personas_dir())
psi = PersonaSearchIndex.from_registry(registry)

# Search by natural language
results = psi.search_personas("security and compliance auditing", k=5)
for r in results:
    print(f"{r.doc_id}: {r.score:.3f} ({r.metadata.get('name')})")

# Find similar personas
similar = psi.similar_personas("RC", k=3)

How It Works¶

Each persona is indexed with: - doc_id: The persona's ID (e.g., "RC") - text: Concatenated riscear.role + riscear.archetype + riscear.responsibilities - metadata: category, name, persona_id

ActionSearchIndex¶

fcc.search.action_search.ActionSearchIndex enables natural-language discovery of workflow actions by searching over description and execution steps.

from fcc.search.action_search import ActionSearchIndex
from fcc.workflow.actions import WorkflowActionRegistry
from fcc._resources import get_actions_dir

action_registry = WorkflowActionRegistry.from_yaml_directory(get_actions_dir())
asi = ActionSearchIndex.from_registry(action_registry)

# Search by natural language
results = asi.search_actions("generate test coverage report", k=5)
for r in results:
    print(f"{r.doc_id}: {r.score:.3f}")

How It Works¶

Each action is indexed with: - doc_id: "{persona_id}:{action_type}" (e.g., "RC:scaffold") - text: Concatenated description + execution_steps - metadata: persona_id, action_type

Import Paths Summary¶

Class	Import Path
`EmbeddingProvider`	`fcc.search.embeddings`
`MockEmbeddingProvider`	`fcc.search.embeddings`
`SentenceTransformerProvider`	`fcc.search.embeddings`
`SearchIndex`	`fcc.search.index`
`InMemoryBackend`	`fcc.search.index`
`NumpyBackend`	`fcc.search.index`
`PersonaSearchIndex`	`fcc.search.persona_search`
`ActionSearchIndex`	`fcc.search.action_search`
`SearchResult`	`fcc.search.models`

API Reference: Semantic Search¶

EmbeddingProvider Protocol¶

Required Methods¶

MockEmbeddingProvider¶

SentenceTransformerProvider¶

SearchIndex¶

Adding Documents¶

Searching¶

Remove and Check¶

Persistence¶

Backends¶

PersonaSearchIndex¶

How It Works¶

ActionSearchIndex¶

How It Works¶

Import Paths Summary¶

See Also¶