Elephant Embeddings V1 Text Small

elephant-embeddings-v1-text-small is the text embedding model in the Agentic Intelligence Lab Elephant Embeddings V1 family.

This ModelScope release is maintained by agentic-intelligence-lab to make Elephant embedding models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model llm-semantic-router/eggon-embed under a consistent Elephant model namespace.

Positioning

This model is a multilingual long-context text embedding model for agent-native retrieval and semantic matching. It is designed for systems where embeddings are on the runtime hot path:

agent memory recall
knowledge retrieval and RAG
tool, skill, and route matching
long-horizon state search
multilingual semantic indexing
clustering and deduplication

The model combines 32K context, ModernBERT encoder architecture, and 2D Matryoshka training so one embedding space can serve multiple latency, storage, and quality budgets.

Model at a glance

Item	Value
Family	Elephant Embeddings V1
Maintainer	Agentic Intelligence Lab
Model type	Text embedding model
Modalities	Text
Languages	Multilingual
Architecture	ModernBERT encoder with YaRN scaling
Parameters	~307M
Hidden size	768
Layers	22
Context length	32,768 tokens
Pooling	Mean pooling
Similarity	Cosine
Matryoshka dimensions	768, 512, 256, 128, 64
Upstream source	`llm-semantic-router/eggon-embed`
License	Apache 2.0

Why it fits agentic workloads

Agentic systems call embedding models repeatedly: before retrieval, during routing, while matching tools, when searching memory, and when compressing or reranking state. This model is optimized for that operating pattern rather than for a single offline benchmark.

Key advantages:

One semantic space across the stack: routing, retrieval, memory lookup, and semantic matching can share one vector space.
Budget-adaptive vectors: truncate full 768-dimensional vectors to 256d, 128d, or 64d for cheaper indexes and faster candidate generation.
Long-context representation: encode larger notes, traces, tool descriptions, and document chunks before aggressive chunking is required.
Practical deployment size: a 307M-class encoder is easier to host than much larger embedding models when inference is frequent.

Recommended use cases

Scenario	Recommended dimension	Notes
Broad route matching	64d or 128d	Cheap candidate generation over large route/tool sets
Large memory-bank search	64d or 256d	Lower storage and bandwidth cost
Main RAG retrieval	256d or 512d	Balanced quality and cost
High-confidence matching	768d	Best semantic fidelity
Long-document indexing	768d	Preserve richer context before chunking

Quick start on ModelScope

pip install modelscope sentence-transformers torch

from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer

repo_id = "agentic-intelligence-lab/elephant-embeddings-v1-text-small"
local_dir = snapshot_download(repo_id)

model = SentenceTransformer(local_dir)

texts = [
    "Find tool descriptions related to browser automation.",
    "检索和用户历史偏好相关的记忆。",
    "Retrieve notes about deployment failures in staging.",
]

embeddings = model.encode(texts, normalize_embeddings=True)
print(embeddings.shape)  # (3, 768)

Matryoshka truncation

import torch.nn.functional as F
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer

local_dir = snapshot_download("agentic-intelligence-lab/elephant-embeddings-v1-text-small")
model = SentenceTransformer(local_dir)

embeddings = model.encode(texts, convert_to_tensor=True, normalize_embeddings=True)

# Balanced retrieval tier
embeddings_256d = F.normalize(embeddings[:, :256], p=2, dim=1)

# Low-cost routing or large memory-bank tier
embeddings_64d = F.normalize(embeddings[:, :64], p=2, dim=1)

Evaluation snapshot

Metric	Score
MTEB mean, 24 tasks	61.4
STS Benchmark	80.5
Dimension retention	99% @ 256d, 98% @ 64d
Layer speedup	3.3× @ 6L, 5.8× @ 3L
Long-context retrieval R@1, 4K tokens	68.8%
Long-context retrieval R@10, 4K tokens	81.2%

These results make the model useful for systems that must balance quality, latency, vector size, and deployment simplicity.

Files

File	Description
`model.safetensors`	Model weights
`config.json`	ModernBERT configuration
`tokenizer.json` / `tokenizer_config.json`	Tokenizer assets
`modules.json` / `1_Pooling/config.json`	Sentence Transformers packaging
`README.md`	This model card

Lineage

This ModelScope package is published by agentic-intelligence-lab as part of the Elephant model release line. It mirrors the upstream HuggingFace model llm-semantic-router/eggon-embed and keeps the model artifacts unchanged except for the repository naming and model card presentation.

Limitations

Full 768-dimensional embeddings are recommended for important final-stage retrieval decisions.
Aggressive dimension or layer reduction trades quality for speed and storage efficiency.
Very long inputs are supported, but they still increase compute and memory cost.
The model is optimized for retrieval and semantic similarity, not text generation.

Citation

@misc{elephant-embeddings-v1-text-small,
  title={Elephant Embeddings V1 Text Small},
  author={Agentic Intelligence Lab},
  year={2026},
  url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-embeddings-v1-text-small}
}

License

Apache 2.0

Downloads last month: 52

Safetensors

Model size

0.3B params

Tensor type

BF16

Model tree for agentic-in/elephant-embeddings-v1-text-small

Base model

jhu-clsp/mmBERT-base

Quantized

llm-semantic-router/mmbert-32k-yarn

Finetuned

(4)

this model

Evaluation results

Spearman on STS Benchmark
self-reported

80.500