Feature Extraction
Model2Vec
Safetensors
sentence-transformers
English
embeddings
static-embeddings
tokenlearn
Eval Results (legacy)
Instructions to use blobbybob/potion-mxbai-512d with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Model2Vec
How to use blobbybob/potion-mxbai-512d with Model2Vec:
from model2vec import StaticModel model = StaticModel.from_pretrained("blobbybob/potion-mxbai-512d") - sentence-transformers
How to use blobbybob/potion-mxbai-512d with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("blobbybob/potion-mxbai-512d") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
potion-mxbai-512d
A static embedding model that outperforms potion-base-32M โ the previous best static embedding model โ by using a stronger teacher model and matching its architecture. Trained on 1M C4 sentences.
Update: A newer version trained on 2M sentences is available: potion-mxbai-2m-512d โ scoring 71.28 avg (+0.55 over this model, +2.23 STS).
Highlights
- 70.73 avg on MTEB English (STS + Classification + PairClassification) vs potion-base-32M's 69.96
- +3.62 STS points over potion-base-32M (69.36 vs 65.74)
- 500x faster than transformer-based embedding models on CPU
- ~32MB model size (63K vocab x 512 dims, float16)
- Pure numpy inference โ no GPU needed
How It Was Made
- Teacher: mixedbread-ai/mxbai-embed-large-v1 (335M params, BERT-large architecture, MTEB 64.68)
- Custom vocabulary: 56K tokens built from 1M C4 English sentences via corpus frequency analysis
- Distillation: model2vec distillation with 512-dim PCA
- Tokenlearn pre-training: Contrastive loss training on 1M C4 sentences using tokenlearn
Benchmark Results
| Model | STS | Classification | PairClassification | Avg |
|---|---|---|---|---|
| potion-mxbai-512d (this) | 69.36 | 65.52 | 77.32 | 70.73 |
| potion-base-32M | 65.74 | 65.96 | 78.17 | 69.96 |
Per-task breakdown vs potion-base-32M
| Task | Ours | Potion | Diff |
|---|---|---|---|
| STS22 | 65.79 | 36.69 | +29.09 |
| STS17 | 34.21 | 29.85 | +4.36 |
| EmotionClassification | 51.59 | 48.29 | +3.30 |
| STS12 | 65.37 | 62.72 | +2.64 |
| ImdbClassification | 71.73 | 70.13 | +1.61 |
| TweetSentimentExtraction | 57.69 | 56.58 | +1.11 |
| STS15 | 81.62 | 80.76 | +0.86 |
| BIOSSES | 78.27 | 77.56 | +0.72 |
| STS13 | 77.84 | 77.59 | +0.24 |
| SICK-R | 65.78 | 65.67 | +0.12 |
| SprintDuplicateQuestions | 92.60 | 92.55 | +0.04 |
Wins on 11/25 tasks.
Usage
from model2vec import StaticModel
model = StaticModel.from_pretrained("blobbybob/potion-mxbai-512d")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])
With Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("blobbybob/potion-mxbai-512d")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])
Training Details
- Compute: Modal A10G GPU
- Featurization: ~2 hours (1M C4 sentences through mxbai-embed-large-v1)
- Training: ~24 minutes (7 epochs, contrastive loss, early stopping)
- Total cost: ~$5-6 on Modal
Citation
@article{minishlab2024model2vec,
author = {Tulkens, Stephan and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
url = {https://github.com/MinishLab/model2vec}
}
- Downloads last month
- 4
Model tree for blobbybob/potion-mxbai-512d
Base model
mixedbread-ai/mxbai-embed-large-v1Evaluation results
- spearman_cosineself-reported69.360
- accuracyself-reported65.520
- apself-reported77.320
from model2vec import StaticModel model = StaticModel.from_pretrained("blobbybob/potion-mxbai-512d")