Tarka-Embedding-250M-V1 is a lightweight 250M-parameter embedding model that produces 1024-dimensional dense text representations. It is built for a wide range of downstream tasks including semantic similarity, search, RAG and general text understanding,while remaining efficient enough for both on-device and large-scale production environments.

The model was developed through a multi-stage distillation and structural compression workflow. Starting from a 28-layer teacher model, we conducted layer-wise contribution analysis and found that only a small subset of layers significantly impact the final representation quality. This helped us to remove redundant layers and iteratively refine the architecture, and resulted with just 6 decoder layers while retaining most of the embedding performance.

As part of this process, we also released an intermediate 300M-parameter checkpoint, Tarka-Embedding-300M-V1-Preview, which served as a stepping stone for further pruning and experimentation. Tarka-Embedding-250M-V1 represents the final, optimized model in this series, delivering strong results across MTEB tasks with a fraction of the computational and memory overhead.

Find more information about Tarka-Embedding-250M-V1 in our blog post

🚀 Try our demo: https://huggingface.co/spaces/Tarka-AIR/Tarka-Embedding

Model Details

Tarka-Embedding-250M-V1 has the following features:

Model Type: Text Embedding
Supported Languages: 100+ Languages
Number of Paramaters: 250M
Context Length: Supports up to 32k tokens; optimal performance is observed with inputs under 4K tokens
Embedding Dimension: 1024

While our training data includes samples from multiple languages, the model was primarily optimized for English, so performance may be comparatively lower on non-English inputs.

Evaluation

MTEB (Eng v2)

MTEB English / Models	Param.	Mean(Task)	Mean(Type)	Class.	Clust.	Pair Class.	Rerank.	Retri.	STS	Summ.
multilingual-e5-large-instruct	0.6B	65.53	61.21	75.54	49.89	86.24	48.74	53.47	84.72	29.89
NV-Embed-v2	7.8B	69.81	65.00	87.19	47.66	88.69	49.61	62.84	83.82	35.21
GritLM-7B	7.2B	67.07	63.22	81.25	50.82	87.29	49.59	54.95	83.03	35.65
gte-Qwen2-1.5B-instruct	1.5B	67.20	63.26	85.84	53.54	87.52	49.25	50.25	82.51	33.94
stella_en_1.5B_v5	1.5B	69.43	65.32	89.38	57.06	88.02	50.19	52.42	83.27	36.91
gte-Qwen2-7B-instruct	7.6B	70.72	65.77	88.52	58.97	85.9	50.47	58.09	82.69	35.74
gemini-embedding-exp-03-07	-	73.3	67.67	90.05	59.39	87.7	48.59	64.35	85.29	38.28
Qwen3-Embedding-0.6B	0.6B	70.70	64.88	85.76	54.05	84.37	48.18	61.83	86.57	33.43
Tarka-Embedding-250M-V1	0.25B	67.57	62.38	84.91	53.0	83.57	46.1	54.25	83.38	31.42

Usage

For the best performance use Flash attention with bfloat16

from sentence_transformers import SentenceTransformer

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
model = SentenceTransformer(
    "Tarka-AIR/Tarka-Embedding-250M-V1",
    trust_remote_code=True,
    model_kwargs={
        "attn_implementation": "flash_attention_2",
        "device_map": "cuda",
        "torch_dtype": "bfloat16",
    },
    tokenizer_kwargs={"padding_side": "left"},
)

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

# tensor([[0.8073, 0.1786],
#        [0.1667, 0.6602]])

Acknowledgments

Special thanks to:

Qwen,jasper and stella team for providing the base model and foundational research.

Gratitude is also extended to the open-source community for creating the tools, frameworks, and datasets that enabled fine-tuning and evaluation of this model.