HearthNet / docs /modules /M11-embedding.md
GitHub Actions
Add all-to-all internet mesh over relay hub (P1-P3) + user-story screenshot proof
8f53c4c
|
Raw
History Blame Contribute Delete
6.09 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

M11 β€” Embedding Service

Spec version: v1.0 Depends on: M03 (bus), X04 (config), X03 (observability), sentence-transformers, torch Depended on by: M05 (RAG uses embed.text), M06 (marketplace.search uses embed.text)


1. Responsibility

Provide capabilities embed.text@1.0 (and Phase 2 embed.image@1.0). Wrap one or more embedding backends, register them with the bus, batch incoming requests for throughput.

Embeddings are separated from llm.* because their workload is different: small models, high throughput, batchable, often CPU-runnable.


2. File layout

hearthnet/services/embedding/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ service.py             # EmbeddingService
└── backends.py            # SentenceTransformerBackend, OllamaEmbedBackend (Phase 2)

3. Public API

3.1 backends.py

# hearthnet/services/embedding/backends.py
from typing import Protocol

class EmbeddingBackend(Protocol):
    name:        str      # "sentence_transformers" | "ollama" | "hf_api"
    model:       str      # "BAAI/bge-small-en-v1.5"
    dim:         int      # 384 for bge-small
    max_input:   int      # max chars per text

    async def embed(self, texts: list[str], *, normalize: bool = True) -> list[list[float]]: ...
    async def warm(self) -> None: ...
    async def close(self) -> None: ...
    def health(self) -> dict: ...


class SentenceTransformerBackend:
    """Local backend using sentence-transformers + torch."""

    def __init__(self, model: str, device: str = "auto"):
        """device: 'auto' picks cuda if available else cpu."""

    async def embed(self, texts, *, normalize=True): ...
    async def warm(self): ...
    async def close(self): ...
    def health(self): ...

3.2 service.py

# hearthnet/services/embedding/service.py
class EmbeddingService:
    name    = "embedding"
    version = "1.0"

    def __init__(self, config: EmbeddingConfig):
        self._backend: EmbeddingBackend = SentenceTransformerBackend(
            model=config.model, device=config.device
        )

    def capabilities(self) -> list[tuple[CapabilityDescriptor, Callable, ParamsPredicate]]:
        """Returns one entry for embed.text@1.0."""

    async def start(self) -> None: ...
    async def stop(self) -> None: ...
    def health(self) -> dict: ...

    # --- handler ---

    async def handle_embed_text(self, req: RouteRequest) -> dict:
        """Implements embed.text@1.0 (CONTRACT Β§4.3)."""

3.3 Capability descriptor (embed.text@1.0)

descriptor = CapabilityDescriptor(
    name="embed.text",
    version=(1, 0),
    stability="stable",
    request_schema={
        "type": "object",
        "required": ["params", "input"],
        "properties": {
            "params": {"type": "object", "properties": {
                "model": {"type": "string"},
            }, "required": ["model"]},
            "input": {"type": "object", "required": ["texts"], "properties": {
                "texts":     {"type": "array", "items": {"type": "string"}, "minItems": 1, "maxItems": 256},
                "normalize": {"type": "boolean", "default": True},
            }},
        },
    },
    response_schema={
        "type": "object",
        "required": ["output", "meta"],
        "properties": {
            "output": {"type": "object", "required": ["embeddings", "dim"], "properties": {
                "embeddings": {"type": "array", "items": {"type": "array", "items": {"type": "number"}}},
                "dim":        {"type": "integer"},
            }},
            "meta":   {"type": "object", "required": ["model", "ms"]},
        },
    },
    stream_schema=None,
    params={"model": "<from backend>"},
    max_concurrent=8,
    trust_required="member",
    timeout_seconds=15,
    idempotent=True,
)

3.4 params_compatible predicate

def params_compatible(offered: dict, requested: dict) -> bool:
    # request must specify model; must match offered exactly
    return requested.get("model") == offered.get("model")

4. Behaviour

4.1 Batching (optional optimisation; Phase 1.5)

If multiple requests arrive within a small window (e.g. 20 ms), combine their texts arrays into one backend call. Demultiplex results back. MVP: no batching (per-request), simpler.

4.2 Validation

  • 256 texts β†’ bad_request

  • Any text > 8192 chars β†’ bad_request
  • Unknown model β†’ not_found (model not loaded; consider asking another node)

4.3 Resource sizing

max_concurrent = 8 is a sensible default on CPU. On GPU, increase via subclass that overrides max_concurrent. The number is declared in the manifest so the bus can throttle correctly.


5. Errors

Only the universal codes from CONTRACT Β§9:

  • bad_request β€” texts too long, too many, etc.
  • not_found β€” model not loaded
  • internal_error β€” backend crash

6. Configuration

From X04 Β§3:

config.embedding.model   # "BAAI/bge-small-en-v1.5"
config.embedding.device  # "auto"

7. Tests

Unit

  • test_descriptor_schema_validates_meta_schema
  • test_handler_rejects_oversized_text
  • test_handler_rejects_too_many_texts
  • test_params_compatible_exact_model_match
  • test_embed_normalises_to_unit_length

Integration

  • test_rag_calls_embed_via_bus β€” RAG ingests, embeds via bus.call(), no direct service-to-service import
  • test_remote_embed_fallback β€” local backend not loaded, bus routes to peer

8. Cross-references

What Where
embed.text@1.0 wire spec CONTRACT Β§4.3
Service protocol M03 Β§4
Consumed by RAG M05 Β§5
Consumed by marketplace.search M06 Β§5

9. Open questions

  1. Phase 2 embed.image@1.0 (CLIP) β€” adds an image backend; params includes modality. Reserved.
  2. Batching policy β€” measured trade-off; deferred. Defaults to immediate dispatch in MVP.