Tarka-Embedding-250M-V1 is a lightweight 250M-parameter embedding model that produces 1024-dimensional dense text representations. It is built for a wide range of downstream tasks including semantic similarity, search, RAG and general text understanding,while remaining efficient enough for both on-device and large-scale production environments.
The model was developed through a multi-stage distillation and structural compression workflow. Starting from a 28-layer teacher model, we conducted layer-wise contribution analysis and found that only a small subset of layers significantly impact the final representation quality. This helped us to remove redundant layers and iteratively refine the architecture, and resulted with just 6 decoder layers while retaining most of the embedding performance.
As part of this process, we also released an intermediate 300M-parameter checkpoint, Tarka-Embedding-300M-V1-Preview, which served as a stepping stone for further pruning and experimentation. Tarka-Embedding-250M-V1 represents the final, optimized model in this series, delivering strong results across MTEB tasks with a fraction of the computational and memory overhead.
Find more information about Tarka-Embedding-250M-V1 in our blog post
๐ Try our demo: https://huggingface.co/spaces/Tarka-AIR/Tarka-Embedding
Model Details
Tarka-Embedding-250M-V1 has the following features:
- Model Type: Text Embedding
- Supported Languages: 100+ Languages
- Number of Paramaters: 250M
- Context Length: Supports up to 32k tokens; optimal performance is observed with inputs under 4K tokens
- Embedding Dimension: 1024
While our training data includes samples from multiple languages, the model was primarily optimized for English, so performance may be comparatively lower on non-English inputs.
Evaluation
MTEB (Eng v2)
| MTEB English / Models | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retri. | STS | Summ. |
|---|---|---|---|---|---|---|---|---|---|---|
| multilingual-e5-large-instruct | 0.6B | 65.53 | 61.21 | 75.54 | 49.89 | 86.24 | 48.74 | 53.47 | 84.72 | 29.89 |
| NV-Embed-v2 | 7.8B | 69.81 | 65.00 | 87.19 | 47.66 | 88.69 | 49.61 | 62.84 | 83.82 | 35.21 |
| GritLM-7B | 7.2B | 67.07 | 63.22 | 81.25 | 50.82 | 87.29 | 49.59 | 54.95 | 83.03 | 35.65 |
| gte-Qwen2-1.5B-instruct | 1.5B | 67.20 | 63.26 | 85.84 | 53.54 | 87.52 | 49.25 | 50.25 | 82.51 | 33.94 |
| stella_en_1.5B_v5 | 1.5B | 69.43 | 65.32 | 89.38 | 57.06 | 88.02 | 50.19 | 52.42 | 83.27 | 36.91 |
| gte-Qwen2-7B-instruct | 7.6B | 70.72 | 65.77 | 88.52 | 58.97 | 85.9 | 50.47 | 58.09 | 82.69 | 35.74 |
| gemini-embedding-exp-03-07 | - | 73.3 | 67.67 | 90.05 | 59.39 | 87.7 | 48.59 | 64.35 | 85.29 | 38.28 |
| Qwen3-Embedding-0.6B | 0.6B | 70.70 | 64.88 | 85.76 | 54.05 | 84.37 | 48.18 | 61.83 | 86.57 | 33.43 |
| Tarka-Embedding-250M-V1 | 0.25B | 67.57 | 62.38 | 84.91 | 53.0 | 83.57 | 46.1 | 54.25 | 83.38 | 31.42 |
Usage
For the best performance use Flash attention with bfloat16
from sentence_transformers import SentenceTransformer
# We recommend enabling flash_attention_2 for better acceleration and memory saving,
model = SentenceTransformer(
"Tarka-AIR/Tarka-Embedding-250M-V1",
trust_remote_code=True,
model_kwargs={
"attn_implementation": "flash_attention_2",
"device_map": "cuda",
"torch_dtype": "bfloat16",
},
tokenizer_kwargs={"padding_side": "left"},
)
# The queries and documents to embed
queries = [
"What is the capital of China?",
"Explain gravity",
]
documents = [
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]
# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)
# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.8073, 0.1786],
# [0.1667, 0.6602]])
Acknowledgments
Special thanks to:
- Qwen,jasper and stella team for providing the base model and foundational research.
Gratitude is also extended to the open-source community for creating the tools, frameworks, and datasets that enabled fine-tuning and evaluation of this model.
Disclaimer The creator of this Model is not responsible for any misuse, damages, or legal issues arising from the use of this model.
- Downloads last month
- 4