potion-mxbai-128d-v2

An ultra-compact static embedding model at just 3.9MB — 16x smaller than the 512D baseline while retaining 98% of its quality.

Highlights

69.83 avg on full MTEB English (STS + Classification + PairClassification, 25 tasks)
3.9MB with int8 quantization (16x smaller than 512D baseline)
80-88x faster than all-MiniLM-L6-v2 on CPU (~18K vs ~200 sentences/sec)
Pure numpy inference — no GPU needed
Native int8 support via model2vec v0.7 — zero quality loss

How It Was Made

Teacher: mixedbread-ai/mxbai-embed-large-v1 (335M params, BERT-large architecture)
Distillation: model2vec distillation with 256-dim PCA and corpus-informed vocabulary
Tokenlearn pre-training: Contrastive loss training on ~217K C4 English sentences using tokenlearn
Born-again self-distillation: A second round of contrastive training using the model's own sentence embeddings as targets, closing the teacher-student representation gap (+0.49 avg)
PCA to 128D: The 256D born-again embeddings are PCA-reduced to 128 dimensions, retaining 77% of variance while halving the size. This outperforms training natively at 128D because the 256D born-again model captures richer structure.

Benchmark Results (Full MTEB English Suite)

Model	STS	Classification	PairClassification	Avg	Size (int8)
potion-mxbai-2m-512d	74.15	65.44	76.80	72.13	~125MB
potion-mxbai-256d-v2	73.79	63.23	77.33	71.45	7.5MB
potion-mxbai-128d-v2 (this)	72.56	61.48	75.45	69.83	3.9MB

Evaluated on 25 tasks (10 STS, 12 Classification, 3 PairClassification), English subsets only, identical eval code across all models.

Usage

from model2vec import StaticModel

# INT8 quantized (3.9MB)
model = StaticModel.from_pretrained("blobbybob/potion-mxbai-128d-v2")

embeddings = model.encode(["Hello world", "Static embeddings are fast"])

With Sentence Transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("blobbybob/potion-mxbai-128d-v2")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])

When to use this model

You need the smallest possible embedding model that still works well
Deploying on mobile, IoT, or edge devices with strict memory limits
Embedding millions of documents where storage cost matters (3.9MB vs 100MB+)
You need instant loading — the entire model fits in L2 cache

Model Family

Model	Avg	Size (int8)	Best for
potion-mxbai-2m-512d	72.13	~125MB	Maximum quality
potion-mxbai-256d-v2	71.45	7.5MB	Best quality/size balance
potion-mxbai-128d-v2	69.83	3.9MB	Compact deployments
potion-mxbai-micro	68.91	0.7MB	Ultra-tiny / embedded

Training Details

Featurization: ~217K C4 sentences encoded by mxbai-embed-large-v1
Training: Tokenlearn contrastive loss + born-again self-distillation, batch size 256
Vocabulary: 29,524 tokens (corpus-informed vocabulary from mxbai teacher tokenizer)
Dimensions: 128 (PCA from 256D born-again model)
Compute: Local RTX 2070

Citation

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}

Downloads last month: 39

Safetensors

Model size

3.84M params

Tensor type

I64

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for blobbybob/potion-mxbai-128d-v2

Base model

mixedbread-ai/mxbai-embed-large-v1

Finetuned

(56)

this model

Evaluation results

spearman_cosine on MTEB STS (English, 10 tasks)
self-reported

72.560
accuracy on MTEB Classification (English, 12 tasks)
self-reported

61.480
ap on MTEB PairClassification (English, 3 tasks)
self-reported

75.450