potion-mxbai-2m-512d
A high-quality static embedding model that outperforms potion-base-32M โ trained on 2M C4 sentences with tokenlearn contrastive pre-training.
Highlights
- 72.13 avg on full MTEB English (STS + Classification + PairClassification, 25 tasks, English subsets only)
- 80-88x faster than all-MiniLM-L6-v2 on CPU (~16K vs ~200 sentences/sec)
- ~125MB model size (29K vocab x 512 dims, float32)
- Pure numpy inference โ no GPU needed
How It Was Made
- Teacher: mixedbread-ai/mxbai-embed-large-v1 (335M params, BERT-large architecture)
- Custom vocabulary: Built from 2M C4 English sentences
- Distillation: model2vec distillation with 512-dim PCA
- Tokenlearn pre-training: Contrastive loss training on 2M C4 sentences using tokenlearn
Benchmark Results (Full MTEB English Suite)
| Model | STS | Classification | PairClassification | Avg | Size |
|---|---|---|---|---|---|
| potion-mxbai-2m-512d (this) | 74.15 | 65.44 | 76.80 | 72.13 | ~125MB |
| potion-mxbai-256d-v2 | 71.92 | 63.05 | 73.99 | 69.65 | 7.2MB (int8) |
| potion-mxbai-128d-v2 | 70.81 | 60.62 | 72.46 | 67.97 | 3.6MB (int8) |
Evaluated on 25 tasks (10 STS, 12 Classification, 3 PairClassification), English subsets only.
Model Family
| Model | Avg | Size | Best for |
|---|---|---|---|
| potion-mxbai-2m-512d | 72.13 | ~125MB | Maximum quality |
| potion-mxbai-256d-v2 | 69.65 | 7.2MB (int8) | Best quality/size balance |
| potion-mxbai-128d-v2 | 67.97 | 3.6MB (int8) | Extreme size constraints |
Usage
from model2vec import StaticModel
model = StaticModel.from_pretrained("blobbybob/potion-mxbai-2m-512d")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])
With Sentence Transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("blobbybob/potion-mxbai-2m-512d")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])
Training Details
- Featurization: 2M C4 sentences encoded by mxbai-embed-large-v1
- Training: Tokenlearn contrastive loss, batch size 256
- Total cost: ~$3-4 on Modal
Citation
@article{minishlab2024model2vec,
author = {Tulkens, Stephan and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
url = {https://github.com/MinishLab/model2vec}
}
- Downloads last month
- 344
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for blobbybob/potion-mxbai-2m-512d
Base model
mixedbread-ai/mxbai-embed-large-v1Evaluation results
- spearman_cosine on MTEB STS (English, 10 tasks)self-reported74.150
- accuracy on MTEB Classification (English, 12 tasks)self-reported65.440
- ap on MTEB PairClassification (English, 3 tasks)self-reported76.800