potion-mxbai-128d-v2
An ultra-compact static embedding model at just 3.9MB โ 16x smaller than the 512D baseline while retaining 98% of its quality.
Highlights
- 69.83 avg on full MTEB English (STS + Classification + PairClassification, 25 tasks)
- 3.9MB with int8 quantization (16x smaller than 512D baseline)
- 80-88x faster than all-MiniLM-L6-v2 on CPU (~18K vs ~200 sentences/sec)
- Pure numpy inference โ no GPU needed
- Native int8 support via model2vec v0.7 โ zero quality loss
How It Was Made
- Teacher: mixedbread-ai/mxbai-embed-large-v1 (335M params, BERT-large architecture)
- Distillation: model2vec distillation with 256-dim PCA and corpus-informed vocabulary
- Tokenlearn pre-training: Contrastive loss training on ~217K C4 English sentences using tokenlearn
- Born-again self-distillation: A second round of contrastive training using the model's own sentence embeddings as targets, closing the teacher-student representation gap (+0.49 avg)
- PCA to 128D: The 256D born-again embeddings are PCA-reduced to 128 dimensions, retaining 77% of variance while halving the size. This outperforms training natively at 128D because the 256D born-again model captures richer structure.
Benchmark Results (Full MTEB English Suite)
| Model | STS | Classification | PairClassification | Avg | Size (int8) |
|---|---|---|---|---|---|
| potion-mxbai-2m-512d | 74.15 | 65.44 | 76.80 | 72.13 | ~125MB |
| potion-mxbai-256d-v2 | 73.79 | 63.23 | 77.33 | 71.45 | 7.5MB |
| potion-mxbai-128d-v2 (this) | 72.56 | 61.48 | 75.45 | 69.83 | 3.9MB |
Evaluated on 25 tasks (10 STS, 12 Classification, 3 PairClassification), English subsets only, identical eval code across all models.
Usage
from model2vec import StaticModel
# INT8 quantized (3.9MB)
model = StaticModel.from_pretrained("blobbybob/potion-mxbai-128d-v2")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])
With Sentence Transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("blobbybob/potion-mxbai-128d-v2")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])
When to use this model
- You need the smallest possible embedding model that still works well
- Deploying on mobile, IoT, or edge devices with strict memory limits
- Embedding millions of documents where storage cost matters (3.9MB vs 100MB+)
- You need instant loading โ the entire model fits in L2 cache
Model Family
| Model | Avg | Size (int8) | Best for |
|---|---|---|---|
| potion-mxbai-2m-512d | 72.13 | ~125MB | Maximum quality |
| potion-mxbai-256d-v2 | 71.45 | 7.5MB | Best quality/size balance |
| potion-mxbai-128d-v2 | 69.83 | 3.9MB | Compact deployments |
| potion-mxbai-micro | 68.91 | 0.7MB | Ultra-tiny / embedded |
Training Details
- Featurization: ~217K C4 sentences encoded by mxbai-embed-large-v1
- Training: Tokenlearn contrastive loss + born-again self-distillation, batch size 256
- Vocabulary: 29,524 tokens (corpus-informed vocabulary from mxbai teacher tokenizer)
- Dimensions: 128 (PCA from 256D born-again model)
- Compute: Local RTX 2070
Citation
@article{minishlab2024model2vec,
author = {Tulkens, Stephan and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
url = {https://github.com/MinishLab/model2vec}
}
- Downloads last month
- 657
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for blobbybob/potion-mxbai-128d-v2
Base model
mixedbread-ai/mxbai-embed-large-v1Evaluation results
- spearman_cosine on MTEB STS (English, 10 tasks)self-reported72.560
- accuracy on MTEB Classification (English, 12 tasks)self-reported61.480
- ap on MTEB PairClassification (English, 3 tasks)self-reported75.450