Skip to content

ADR-004: Embedding Provider Protocol

Date: 2026-03-29 Status: Accepted

Context

FCC's semantic search system (Book 3, Chapter 1) requires an embedding provider to convert text artifacts into dense vectors for nearest-neighbor search. Multiple embedding providers exist (OpenAI, Anthropic, Cohere, local models via Sentence Transformers), and the framework must support all of them without coupling to any specific provider.

We evaluated three design patterns for the provider interface:

  1. Abstract Base Class (ABC). Define an EmbeddingProviderABC that all providers must inherit from. Registration via the plugin system.
  2. Protocol (structural subtyping). Define an EmbeddingProvider as a typing.Protocol. Any class with the right methods is a valid provider, no inheritance required.
  3. Function-based interface. Define embedding as a plain function embed(texts: list[str]) -> list[list[float]] with no class wrapper.

Key requirements:

  • Support multiple providers without import-time coupling.
  • Enable third-party providers without requiring them to import FCC.
  • Provide a MockEmbeddingProvider for deterministic testing.
  • Support batch embedding for efficiency.
  • Report embedding dimensionality for validation.

Decision

We will use a Protocol-based embedding provider interface with a MockEmbeddingProvider for testing.

The Protocol definition:

from typing import Protocol


class EmbeddingProvider(Protocol):
    @property
    def dimensions(self) -> int: ...
    def embed_batch(self, texts: list[str]) -> list[list[float]]: ...
    def embed(self, text: str) -> list[float]: ...

The MockEmbeddingProvider:

class MockEmbeddingProvider:
    def __init__(self, dimensions: int = 128):
        self._dimensions = dimensions

    @property
    def dimensions(self) -> int:
        return self._dimensions

    def embed_batch(self, texts: list[str]) -> list[list[float]]:
        return [self._hash_embed(t) for t in texts]

    def embed(self, text: str) -> list[float]:
        return self.embed_batch([text])[0]

    def _hash_embed(self, text: str) -> list[float]:
        # Deterministic hash-based embedding
        ...

Consequences

Positive

  • No import coupling. Third-party embedding libraries (OpenAI, Sentence Transformers, custom servers) do not need to import anything from FCC. As long as a class implements dimensions, embed_batch, and embed, it is a valid provider.
  • Duck typing friendly. Existing embedding clients can be wrapped with minimal adapter code. In many cases, the existing client already has the right method signatures.
  • Batch-first API. The primary method is embed_batch, which is 10--100x more efficient than single-text embedding for corpus indexing. The embed method is a convenience wrapper.
  • Dimension validation. The dimensions property allows the SearchIndex to validate that all embeddings have consistent dimensionality, catching provider misconfiguration early.
  • Deterministic testing. The MockEmbeddingProvider generates consistent vectors from text hashes, enabling deterministic tests and CI pipelines without API calls or cost.
  • Plugin compatible. Providers can be registered via the fcc.plugins.providers entry-point group for automatic discovery.

Negative

  • No runtime type enforcement. Protocols use structural subtyping, which means type errors are caught by mypy at static analysis time, not at runtime. A provider with a typo in a method name (embed_btach) will not be caught until it is called.
  • No shared utility code. Unlike an ABC, a Protocol cannot provide default method implementations or shared helper methods. Each provider implements everything from scratch.
  • Discovery friction. Developers must know the Protocol's method signatures to implement a provider. There is no base class to inherit from that provides stubs or documentation.

Mitigations

  • A validate_provider(provider) function checks at registration time that the provider implements all required methods and that dimensions returns a positive integer. This catches most implementation errors early.
  • The Protocol's docstring and the Book 3 Chapter 1 documentation provide clear implementation guidance.
  • The MockEmbeddingProvider serves as a reference implementation that developers can study.