ADR-004: Embedding Provider Protocol¶
Date: 2026-03-29 Status: Accepted
Context¶
FCC's semantic search system (Book 3, Chapter 1) requires an embedding provider to convert text artifacts into dense vectors for nearest-neighbor search. Multiple embedding providers exist (OpenAI, Anthropic, Cohere, local models via Sentence Transformers), and the framework must support all of them without coupling to any specific provider.
We evaluated three design patterns for the provider interface:
- Abstract Base Class (ABC). Define an
EmbeddingProviderABCthat all providers must inherit from. Registration via the plugin system. - Protocol (structural subtyping). Define an
EmbeddingProvideras atyping.Protocol. Any class with the right methods is a valid provider, no inheritance required. - Function-based interface. Define embedding as a plain function
embed(texts: list[str]) -> list[list[float]]with no class wrapper.
Key requirements:
- Support multiple providers without import-time coupling.
- Enable third-party providers without requiring them to import FCC.
- Provide a MockEmbeddingProvider for deterministic testing.
- Support batch embedding for efficiency.
- Report embedding dimensionality for validation.
Decision¶
We will use a Protocol-based embedding provider interface with a MockEmbeddingProvider for testing.
The Protocol definition:
from typing import Protocol
class EmbeddingProvider(Protocol):
@property
def dimensions(self) -> int: ...
def embed_batch(self, texts: list[str]) -> list[list[float]]: ...
def embed(self, text: str) -> list[float]: ...
The MockEmbeddingProvider:
class MockEmbeddingProvider:
def __init__(self, dimensions: int = 128):
self._dimensions = dimensions
@property
def dimensions(self) -> int:
return self._dimensions
def embed_batch(self, texts: list[str]) -> list[list[float]]:
return [self._hash_embed(t) for t in texts]
def embed(self, text: str) -> list[float]:
return self.embed_batch([text])[0]
def _hash_embed(self, text: str) -> list[float]:
# Deterministic hash-based embedding
...
Consequences¶
Positive¶
- No import coupling. Third-party embedding libraries (OpenAI, Sentence Transformers, custom servers) do not need to import anything from FCC. As long as a class implements
dimensions,embed_batch, andembed, it is a valid provider. - Duck typing friendly. Existing embedding clients can be wrapped with minimal adapter code. In many cases, the existing client already has the right method signatures.
- Batch-first API. The primary method is
embed_batch, which is 10--100x more efficient than single-text embedding for corpus indexing. Theembedmethod is a convenience wrapper. - Dimension validation. The
dimensionsproperty allows the SearchIndex to validate that all embeddings have consistent dimensionality, catching provider misconfiguration early. - Deterministic testing. The MockEmbeddingProvider generates consistent vectors from text hashes, enabling deterministic tests and CI pipelines without API calls or cost.
- Plugin compatible. Providers can be registered via the
fcc.plugins.providersentry-point group for automatic discovery.
Negative¶
- No runtime type enforcement. Protocols use structural subtyping, which means type errors are caught by mypy at static analysis time, not at runtime. A provider with a typo in a method name (
embed_btach) will not be caught until it is called. - No shared utility code. Unlike an ABC, a Protocol cannot provide default method implementations or shared helper methods. Each provider implements everything from scratch.
- Discovery friction. Developers must know the Protocol's method signatures to implement a provider. There is no base class to inherit from that provides stubs or documentation.
Mitigations¶
- A
validate_provider(provider)function checks at registration time that the provider implements all required methods and thatdimensionsreturns a positive integer. This catches most implementation errors early. - The Protocol's docstring and the Book 3 Chapter 1 documentation provide clear implementation guidance.
- The MockEmbeddingProvider serves as a reference implementation that developers can study.