Elephant Embeddings V1 Text Small

elephant-embeddings-v1-text-small is the text embedding model in the Agentic Intelligence Lab Elephant Embeddings V1 family.

This ModelScope release is maintained by agentic-intelligence-lab to make Elephant embedding models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model llm-semantic-router/eggon-embed under a consistent Elephant model namespace.

Positioning

This model is a multilingual long-context text embedding model for agent-native retrieval and semantic matching. It is designed for systems where embeddings are on the runtime hot path:

  • agent memory recall
  • knowledge retrieval and RAG
  • tool, skill, and route matching
  • long-horizon state search
  • multilingual semantic indexing
  • clustering and deduplication

The model combines 32K context, ModernBERT encoder architecture, and 2D Matryoshka training so one embedding space can serve multiple latency, storage, and quality budgets.

Model at a glance

Item Value
Family Elephant Embeddings V1
Maintainer Agentic Intelligence Lab
Model type Text embedding model
Modalities Text
Languages Multilingual
Architecture ModernBERT encoder with YaRN scaling
Parameters ~307M
Hidden size 768
Layers 22
Context length 32,768 tokens
Pooling Mean pooling
Similarity Cosine
Matryoshka dimensions 768, 512, 256, 128, 64
Upstream source llm-semantic-router/eggon-embed
License Apache 2.0

Why it fits agentic workloads

Agentic systems call embedding models repeatedly: before retrieval, during routing, while matching tools, when searching memory, and when compressing or reranking state. This model is optimized for that operating pattern rather than for a single offline benchmark.

Key advantages:

  • One semantic space across the stack: routing, retrieval, memory lookup, and semantic matching can share one vector space.
  • Budget-adaptive vectors: truncate full 768-dimensional vectors to 256d, 128d, or 64d for cheaper indexes and faster candidate generation.
  • Long-context representation: encode larger notes, traces, tool descriptions, and document chunks before aggressive chunking is required.
  • Practical deployment size: a 307M-class encoder is easier to host than much larger embedding models when inference is frequent.

Recommended use cases

Scenario Recommended dimension Notes
Broad route matching 64d or 128d Cheap candidate generation over large route/tool sets
Large memory-bank search 64d or 256d Lower storage and bandwidth cost
Main RAG retrieval 256d or 512d Balanced quality and cost
High-confidence matching 768d Best semantic fidelity
Long-document indexing 768d Preserve richer context before chunking

Quick start on ModelScope

pip install modelscope sentence-transformers torch
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer

repo_id = "agentic-intelligence-lab/elephant-embeddings-v1-text-small"
local_dir = snapshot_download(repo_id)

model = SentenceTransformer(local_dir)

texts = [
    "Find tool descriptions related to browser automation.",
    "检索和用户历史偏好相关的记忆。",
    "Retrieve notes about deployment failures in staging.",
]

embeddings = model.encode(texts, normalize_embeddings=True)
print(embeddings.shape)  # (3, 768)

Matryoshka truncation

import torch.nn.functional as F
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer

local_dir = snapshot_download("agentic-intelligence-lab/elephant-embeddings-v1-text-small")
model = SentenceTransformer(local_dir)

embeddings = model.encode(texts, convert_to_tensor=True, normalize_embeddings=True)

# Balanced retrieval tier
embeddings_256d = F.normalize(embeddings[:, :256], p=2, dim=1)

# Low-cost routing or large memory-bank tier
embeddings_64d = F.normalize(embeddings[:, :64], p=2, dim=1)

Evaluation snapshot

Metric Score
MTEB mean, 24 tasks 61.4
STS Benchmark 80.5
Dimension retention 99% @ 256d, 98% @ 64d
Layer speedup 3.3× @ 6L, 5.8× @ 3L
Long-context retrieval R@1, 4K tokens 68.8%
Long-context retrieval R@10, 4K tokens 81.2%

These results make the model useful for systems that must balance quality, latency, vector size, and deployment simplicity.

Files

File Description
model.safetensors Model weights
config.json ModernBERT configuration
tokenizer.json / tokenizer_config.json Tokenizer assets
modules.json / 1_Pooling/config.json Sentence Transformers packaging
README.md This model card

Lineage

This ModelScope package is published by agentic-intelligence-lab as part of the Elephant model release line. It mirrors the upstream HuggingFace model llm-semantic-router/eggon-embed and keeps the model artifacts unchanged except for the repository naming and model card presentation.

Limitations

  • Full 768-dimensional embeddings are recommended for important final-stage retrieval decisions.
  • Aggressive dimension or layer reduction trades quality for speed and storage efficiency.
  • Very long inputs are supported, but they still increase compute and memory cost.
  • The model is optimized for retrieval and semantic similarity, not text generation.

Citation

@misc{elephant-embeddings-v1-text-small,
  title={Elephant Embeddings V1 Text Small},
  author={Agentic Intelligence Lab},
  year={2026},
  url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-embeddings-v1-text-small}
}

License

Apache 2.0

Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentic-in/elephant-embeddings-v1-text-small

Finetuned
(4)
this model

Evaluation results