potion-mxbai-2m-512d

A high-quality static embedding model that outperforms potion-base-32M — trained on 2M C4 sentences with tokenlearn contrastive pre-training.

Highlights

72.13 avg on full MTEB English (STS + Classification + PairClassification, 25 tasks, English subsets only)
80-88x faster than all-MiniLM-L6-v2 on CPU (~16K vs ~200 sentences/sec)
~125MB model size (29K vocab x 512 dims, float32)
Pure numpy inference — no GPU needed

How It Was Made

Teacher: mixedbread-ai/mxbai-embed-large-v1 (335M params, BERT-large architecture)
Custom vocabulary: Built from 2M C4 English sentences
Distillation: model2vec distillation with 512-dim PCA
Tokenlearn pre-training: Contrastive loss training on 2M C4 sentences using tokenlearn

Benchmark Results (Full MTEB English Suite)

Model	STS	Classification	PairClassification	Avg	Size
potion-mxbai-2m-512d (this)	74.15	65.44	76.80	72.13	~125MB
potion-mxbai-256d-v2	71.92	63.05	73.99	69.65	7.2MB (int8)
potion-mxbai-128d-v2	70.81	60.62	72.46	67.97	3.6MB (int8)

Evaluated on 25 tasks (10 STS, 12 Classification, 3 PairClassification), English subsets only.

Model Family

Model	Avg	Size	Best for
potion-mxbai-2m-512d	72.13	~125MB	Maximum quality
potion-mxbai-256d-v2	69.65	7.2MB (int8)	Best quality/size balance
potion-mxbai-128d-v2	67.97	3.6MB (int8)	Extreme size constraints

Usage

from model2vec import StaticModel

model = StaticModel.from_pretrained("blobbybob/potion-mxbai-2m-512d")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])

With Sentence Transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("blobbybob/potion-mxbai-2m-512d")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])

Training Details

Featurization: 2M C4 sentences encoded by mxbai-embed-large-v1
Training: Tokenlearn contrastive loss, batch size 256
Total cost: ~$3-4 on Modal

Citation

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}

Downloads last month: 24

Safetensors

Model size

32.6M params

Tensor type

I64

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for blobbybob/potion-mxbai-2m-512d

Base model

mixedbread-ai/mxbai-embed-large-v1

Finetuned

(56)

this model

Evaluation results

spearman_cosine on MTEB STS (English, 10 tasks)
self-reported

74.150
accuracy on MTEB Classification (English, 12 tasks)
self-reported

65.440
ap on MTEB PairClassification (English, 3 tasks)
self-reported

76.800