Tarka Embedding 30M V1

Features

  • Compressed model by 20x.
  • Recovered approx. 86% performance on MTEB(Eng, v2) Benchmark

For more details refer the blog post

Results

MTEB(Eng, V2)

Model Parameters (B) Mean (Task) Mean (TaskType) Classification Clustering Pair Classification Reranking Retrieval STS Summarization
all-MiniLM-L6-v2 0.023 59.03 55.93 69.25 44.9 82.37 47.14 42.92 78.95 25.96
gte-micro-v4 0.019 58.9 56.04 73.04 43.89 82.67 44.78 39.51 79.78 28.59
snowflake-arctic-embed-xs 0.023 59.77 56.12 67 42.44 81.33 45.26 52.65 76.21 27.96
gte-micro 0.017 53.89 52.5 67.47 41.86 80.76 43.16 27.66 77.86 28.76
Qwen3 Embedding 0.6B 0.6 70.7 64.88 85.76 54.05 84.37 48.18 61.83 86.57 33.43
Tarka Embedding 30M V1 (S) 0.03 46.07 45.22 60.37 41.37 66.29 38.34 19.56 64.15 26.44
Tarka Embedding 30M V1 (M) 0.03 51.96 49.88 66.52 43.47 70.66 40.12 30.15 69.81 28.42
Tarka Embedding 30M V1 (L) 0.03 60.43 56.69 79.2 46.99 78.24 43.32 42.5 76.92 29.63

Usage

from sentence_transformers import SentenceTransformer

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
model = SentenceTransformer(
    "Tarka-AIR/Tarka-Embedding-30M-V1",
    trust_remote_code=True,
    model_kwargs={
        "attn_implementation": "flash_attention_2",
        "device_map": "cuda",
        "torch_dtype": "bfloat16",
    },
    tokenizer_kwargs={"padding_side": "left"},
)

# Config the model inference mode ("L","M","S")
model[0].auto_model.configure_subnetwork("L")

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

# tensor([[0.8371, 0.1740],
#         [0.2176, 0.6293]])
Downloads last month
3
Safetensors
Model size
28.1M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support