Tarka-Embedding-250M-V1 is a lightweight 250M-parameter embedding model that produces 1024-dimensional dense text representations. It is built for a wide range of downstream tasks including semantic similarity, search, RAG and general text understanding,while remaining efficient enough for both on-device and large-scale production environments.

The model was developed through a multi-stage distillation and structural compression workflow. Starting from a 28-layer teacher model, we conducted layer-wise contribution analysis and found that only a small subset of layers significantly impact the final representation quality. This helped us to remove redundant layers and iteratively refine the architecture, and resulted with just 6 decoder layers while retaining most of the embedding performance.

As part of this process, we also released an intermediate 300M-parameter checkpoint, Tarka-Embedding-300M-V1-Preview, which served as a stepping stone for further pruning and experimentation. Tarka-Embedding-250M-V1 represents the final, optimized model in this series, delivering strong results across MTEB tasks with a fraction of the computational and memory overhead.

Find more information about Tarka-Embedding-250M-V1 in our blog post

๐Ÿš€ Try our demo: https://huggingface.co/spaces/Tarka-AIR/Tarka-Embedding

Model Details

Tarka-Embedding-250M-V1 has the following features:

  • Model Type: Text Embedding
  • Supported Languages: 100+ Languages
  • Number of Paramaters: 250M
  • Context Length: Supports up to 32k tokens; optimal performance is observed with inputs under 4K tokens
  • Embedding Dimension: 1024

While our training data includes samples from multiple languages, the model was primarily optimized for English, so performance may be comparatively lower on non-English inputs.

Evaluation

MTEB (Eng v2)

MTEB English / Models Param. Mean(Task) Mean(Type) Class. Clust. Pair Class. Rerank. Retri. STS Summ.
multilingual-e5-large-instruct 0.6B 65.53 61.21 75.54 49.89 86.24 48.74 53.47 84.72 29.89
NV-Embed-v2 7.8B 69.81 65.00 87.19 47.66 88.69 49.61 62.84 83.82 35.21
GritLM-7B 7.2B 67.07 63.22 81.25 50.82 87.29 49.59 54.95 83.03 35.65
gte-Qwen2-1.5B-instruct 1.5B 67.20 63.26 85.84 53.54 87.52 49.25 50.25 82.51 33.94
stella_en_1.5B_v5 1.5B 69.43 65.32 89.38 57.06 88.02 50.19 52.42 83.27 36.91
gte-Qwen2-7B-instruct 7.6B 70.72 65.77 88.52 58.97 85.9 50.47 58.09 82.69 35.74
gemini-embedding-exp-03-07 - 73.3 67.67 90.05 59.39 87.7 48.59 64.35 85.29 38.28
Qwen3-Embedding-0.6B 0.6B 70.70 64.88 85.76 54.05 84.37 48.18 61.83 86.57 33.43
Tarka-Embedding-250M-V1 0.25B 67.57 62.38 84.91 53.0 83.57 46.1 54.25 83.38 31.42

Usage

For the best performance use Flash attention with bfloat16

from sentence_transformers import SentenceTransformer

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
model = SentenceTransformer(
    "Tarka-AIR/Tarka-Embedding-250M-V1",
    trust_remote_code=True,
    model_kwargs={
        "attn_implementation": "flash_attention_2",
        "device_map": "cuda",
        "torch_dtype": "bfloat16",
    },
    tokenizer_kwargs={"padding_side": "left"},
)

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

# tensor([[0.8073, 0.1786],
#        [0.1667, 0.6602]])

Acknowledgments

Special thanks to:

  • Qwen,jasper and stella team for providing the base model and foundational research.

Gratitude is also extended to the open-source community for creating the tools, frameworks, and datasets that enabled fine-tuning and evaluation of this model.

Disclaimer The creator of this Model is not responsible for any misuse, damages, or legal issues arising from the use of this model.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Tarka-AIR/Tarka-Embedding-250M-V1

Finetuned
(101)
this model

Space using Tarka-AIR/Tarka-Embedding-250M-V1 1

Collection including Tarka-AIR/Tarka-Embedding-250M-V1