Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
M11 β Embedding Service
Spec version: v1.0
Depends on: M03 (bus), X04 (config), X03 (observability), sentence-transformers, torch
Depended on by: M05 (RAG uses embed.text), M06 (marketplace.search uses embed.text)
1. Responsibility
Provide capabilities embed.text@1.0 (and Phase 2 embed.image@1.0). Wrap one or more embedding backends, register them with the bus, batch incoming requests for throughput.
Embeddings are separated from llm.* because their workload is different: small models, high throughput, batchable, often CPU-runnable.
2. File layout
hearthnet/services/embedding/
βββ __init__.py
βββ service.py # EmbeddingService
βββ backends.py # SentenceTransformerBackend, OllamaEmbedBackend (Phase 2)
3. Public API
3.1 backends.py
# hearthnet/services/embedding/backends.py
from typing import Protocol
class EmbeddingBackend(Protocol):
name: str # "sentence_transformers" | "ollama" | "hf_api"
model: str # "BAAI/bge-small-en-v1.5"
dim: int # 384 for bge-small
max_input: int # max chars per text
async def embed(self, texts: list[str], *, normalize: bool = True) -> list[list[float]]: ...
async def warm(self) -> None: ...
async def close(self) -> None: ...
def health(self) -> dict: ...
class SentenceTransformerBackend:
"""Local backend using sentence-transformers + torch."""
def __init__(self, model: str, device: str = "auto"):
"""device: 'auto' picks cuda if available else cpu."""
async def embed(self, texts, *, normalize=True): ...
async def warm(self): ...
async def close(self): ...
def health(self): ...
3.2 service.py
# hearthnet/services/embedding/service.py
class EmbeddingService:
name = "embedding"
version = "1.0"
def __init__(self, config: EmbeddingConfig):
self._backend: EmbeddingBackend = SentenceTransformerBackend(
model=config.model, device=config.device
)
def capabilities(self) -> list[tuple[CapabilityDescriptor, Callable, ParamsPredicate]]:
"""Returns one entry for embed.text@1.0."""
async def start(self) -> None: ...
async def stop(self) -> None: ...
def health(self) -> dict: ...
# --- handler ---
async def handle_embed_text(self, req: RouteRequest) -> dict:
"""Implements embed.text@1.0 (CONTRACT Β§4.3)."""
3.3 Capability descriptor (embed.text@1.0)
descriptor = CapabilityDescriptor(
name="embed.text",
version=(1, 0),
stability="stable",
request_schema={
"type": "object",
"required": ["params", "input"],
"properties": {
"params": {"type": "object", "properties": {
"model": {"type": "string"},
}, "required": ["model"]},
"input": {"type": "object", "required": ["texts"], "properties": {
"texts": {"type": "array", "items": {"type": "string"}, "minItems": 1, "maxItems": 256},
"normalize": {"type": "boolean", "default": True},
}},
},
},
response_schema={
"type": "object",
"required": ["output", "meta"],
"properties": {
"output": {"type": "object", "required": ["embeddings", "dim"], "properties": {
"embeddings": {"type": "array", "items": {"type": "array", "items": {"type": "number"}}},
"dim": {"type": "integer"},
}},
"meta": {"type": "object", "required": ["model", "ms"]},
},
},
stream_schema=None,
params={"model": "<from backend>"},
max_concurrent=8,
trust_required="member",
timeout_seconds=15,
idempotent=True,
)
3.4 params_compatible predicate
def params_compatible(offered: dict, requested: dict) -> bool:
# request must specify model; must match offered exactly
return requested.get("model") == offered.get("model")
4. Behaviour
4.1 Batching (optional optimisation; Phase 1.5)
If multiple requests arrive within a small window (e.g. 20 ms), combine their texts arrays into one backend call. Demultiplex results back. MVP: no batching (per-request), simpler.
4.2 Validation
256 texts β
bad_request- Any text > 8192 chars β
bad_request - Unknown model β
not_found(model not loaded; consider asking another node)
4.3 Resource sizing
max_concurrent = 8 is a sensible default on CPU. On GPU, increase via subclass that overrides max_concurrent. The number is declared in the manifest so the bus can throttle correctly.
5. Errors
Only the universal codes from CONTRACT Β§9:
bad_requestβ texts too long, too many, etc.not_foundβ model not loadedinternal_errorβ backend crash
6. Configuration
From X04 Β§3:
config.embedding.model # "BAAI/bge-small-en-v1.5"
config.embedding.device # "auto"
7. Tests
Unit
test_descriptor_schema_validates_meta_schematest_handler_rejects_oversized_texttest_handler_rejects_too_many_textstest_params_compatible_exact_model_matchtest_embed_normalises_to_unit_length
Integration
test_rag_calls_embed_via_busβ RAG ingests, embeds via bus.call(), no direct service-to-service importtest_remote_embed_fallbackβ local backend not loaded, bus routes to peer
8. Cross-references
| What | Where |
|---|---|
embed.text@1.0 wire spec |
CONTRACT Β§4.3 |
| Service protocol | M03 Β§4 |
| Consumed by RAG | M05 Β§5 |
| Consumed by marketplace.search | M06 Β§5 |
9. Open questions
- Phase 2
embed.image@1.0(CLIP) β adds an image backend;paramsincludes modality. Reserved. - Batching policy β measured trade-off; deferred. Defaults to immediate dispatch in MVP.