Tarka Embedding 30M V1

Features

Compressed model by 20x.

Recovered approx. 86% performance on MTEB(Eng, v2) Benchmark

For more details refer the blog post

Results

MTEB(Eng, V2)

Model	Parameters (B)	Mean (Task)	Mean (TaskType)	Classification	Clustering	Pair Classification	Reranking	Retrieval	STS	Summarization
all-MiniLM-L6-v2	0.023	59.03	55.93	69.25	44.9	82.37	47.14	42.92	78.95	25.96
gte-micro-v4	0.019	58.9	56.04	73.04	43.89	82.67	44.78	39.51	79.78	28.59
snowflake-arctic-embed-xs	0.023	59.77	56.12	67	42.44	81.33	45.26	52.65	76.21	27.96
gte-micro	0.017	53.89	52.5	67.47	41.86	80.76	43.16	27.66	77.86	28.76
Qwen3 Embedding 0.6B	0.6	70.7	64.88	85.76	54.05	84.37	48.18	61.83	86.57	33.43
Tarka Embedding 30M V1 (S)	0.03	46.07	45.22	60.37	41.37	66.29	38.34	19.56	64.15	26.44
Tarka Embedding 30M V1 (M)	0.03	51.96	49.88	66.52	43.47	70.66	40.12	30.15	69.81	28.42
Tarka Embedding 30M V1 (L)	0.03	60.43	56.69	79.2	46.99	78.24	43.32	42.5	76.92	29.63

Usage

from sentence_transformers import SentenceTransformer

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
model = SentenceTransformer(
    "Tarka-AIR/Tarka-Embedding-30M-V1",
    trust_remote_code=True,
    model_kwargs={
        "attn_implementation": "flash_attention_2",
        "device_map": "cuda",
        "torch_dtype": "bfloat16",
    },
    tokenizer_kwargs={"padding_side": "left"},
)

# Config the model inference mode ("L","M","S")
model[0].auto_model.configure_subnetwork("L")

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

# tensor([[0.8371, 0.1740],
#         [0.2176, 0.6293]])

Downloads last month: 5

Safetensors

Model size

28.1M params

Tensor type

BF16