Instructions to use agentic-in/elephant-embeddings-v1-text-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use agentic-in/elephant-embeddings-v1-text-small with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("agentic-in/elephant-embeddings-v1-text-small") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Elephant Embeddings V1 Text Small
elephant-embeddings-v1-text-small is the text embedding model in the Agentic Intelligence Lab Elephant Embeddings V1 family.
This ModelScope release is maintained by agentic-intelligence-lab to make Elephant embedding models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model llm-semantic-router/eggon-embed under a consistent Elephant model namespace.
Positioning
This model is a multilingual long-context text embedding model for agent-native retrieval and semantic matching. It is designed for systems where embeddings are on the runtime hot path:
- agent memory recall
- knowledge retrieval and RAG
- tool, skill, and route matching
- long-horizon state search
- multilingual semantic indexing
- clustering and deduplication
The model combines 32K context, ModernBERT encoder architecture, and 2D Matryoshka training so one embedding space can serve multiple latency, storage, and quality budgets.
Model at a glance
| Item | Value |
|---|---|
| Family | Elephant Embeddings V1 |
| Maintainer | Agentic Intelligence Lab |
| Model type | Text embedding model |
| Modalities | Text |
| Languages | Multilingual |
| Architecture | ModernBERT encoder with YaRN scaling |
| Parameters | ~307M |
| Hidden size | 768 |
| Layers | 22 |
| Context length | 32,768 tokens |
| Pooling | Mean pooling |
| Similarity | Cosine |
| Matryoshka dimensions | 768, 512, 256, 128, 64 |
| Upstream source | llm-semantic-router/eggon-embed |
| License | Apache 2.0 |
Why it fits agentic workloads
Agentic systems call embedding models repeatedly: before retrieval, during routing, while matching tools, when searching memory, and when compressing or reranking state. This model is optimized for that operating pattern rather than for a single offline benchmark.
Key advantages:
- One semantic space across the stack: routing, retrieval, memory lookup, and semantic matching can share one vector space.
- Budget-adaptive vectors: truncate full 768-dimensional vectors to 256d, 128d, or 64d for cheaper indexes and faster candidate generation.
- Long-context representation: encode larger notes, traces, tool descriptions, and document chunks before aggressive chunking is required.
- Practical deployment size: a 307M-class encoder is easier to host than much larger embedding models when inference is frequent.
Recommended use cases
| Scenario | Recommended dimension | Notes |
|---|---|---|
| Broad route matching | 64d or 128d | Cheap candidate generation over large route/tool sets |
| Large memory-bank search | 64d or 256d | Lower storage and bandwidth cost |
| Main RAG retrieval | 256d or 512d | Balanced quality and cost |
| High-confidence matching | 768d | Best semantic fidelity |
| Long-document indexing | 768d | Preserve richer context before chunking |
Quick start on ModelScope
pip install modelscope sentence-transformers torch
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer
repo_id = "agentic-intelligence-lab/elephant-embeddings-v1-text-small"
local_dir = snapshot_download(repo_id)
model = SentenceTransformer(local_dir)
texts = [
"Find tool descriptions related to browser automation.",
"检索和用户历史偏好相关的记忆。",
"Retrieve notes about deployment failures in staging.",
]
embeddings = model.encode(texts, normalize_embeddings=True)
print(embeddings.shape) # (3, 768)
Matryoshka truncation
import torch.nn.functional as F
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer
local_dir = snapshot_download("agentic-intelligence-lab/elephant-embeddings-v1-text-small")
model = SentenceTransformer(local_dir)
embeddings = model.encode(texts, convert_to_tensor=True, normalize_embeddings=True)
# Balanced retrieval tier
embeddings_256d = F.normalize(embeddings[:, :256], p=2, dim=1)
# Low-cost routing or large memory-bank tier
embeddings_64d = F.normalize(embeddings[:, :64], p=2, dim=1)
Evaluation snapshot
| Metric | Score |
|---|---|
| MTEB mean, 24 tasks | 61.4 |
| STS Benchmark | 80.5 |
| Dimension retention | 99% @ 256d, 98% @ 64d |
| Layer speedup | 3.3× @ 6L, 5.8× @ 3L |
| Long-context retrieval R@1, 4K tokens | 68.8% |
| Long-context retrieval R@10, 4K tokens | 81.2% |
These results make the model useful for systems that must balance quality, latency, vector size, and deployment simplicity.
Files
| File | Description |
|---|---|
model.safetensors |
Model weights |
config.json |
ModernBERT configuration |
tokenizer.json / tokenizer_config.json |
Tokenizer assets |
modules.json / 1_Pooling/config.json |
Sentence Transformers packaging |
README.md |
This model card |
Lineage
This ModelScope package is published by agentic-intelligence-lab as part of the Elephant model release line. It mirrors the upstream HuggingFace model llm-semantic-router/eggon-embed and keeps the model artifacts unchanged except for the repository naming and model card presentation.
Limitations
- Full 768-dimensional embeddings are recommended for important final-stage retrieval decisions.
- Aggressive dimension or layer reduction trades quality for speed and storage efficiency.
- Very long inputs are supported, but they still increase compute and memory cost.
- The model is optimized for retrieval and semantic similarity, not text generation.
Citation
@misc{elephant-embeddings-v1-text-small,
title={Elephant Embeddings V1 Text Small},
author={Agentic Intelligence Lab},
year={2026},
url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-embeddings-v1-text-small}
}
License
Apache 2.0
- Downloads last month
- -
Model tree for agentic-in/elephant-embeddings-v1-text-small
Base model
jhu-clsp/mmBERT-baseEvaluation results
- Spearman on STS Benchmarkself-reported80.500