potion-mxbai-2m-512d

A high-quality static embedding model that outperforms potion-base-32M โ€” trained on 2M C4 sentences with tokenlearn contrastive pre-training.

Highlights

  • 72.13 avg on full MTEB English (STS + Classification + PairClassification, 25 tasks, English subsets only)
  • 80-88x faster than all-MiniLM-L6-v2 on CPU (~16K vs ~200 sentences/sec)
  • ~125MB model size (29K vocab x 512 dims, float32)
  • Pure numpy inference โ€” no GPU needed

How It Was Made

  1. Teacher: mixedbread-ai/mxbai-embed-large-v1 (335M params, BERT-large architecture)
  2. Custom vocabulary: Built from 2M C4 English sentences
  3. Distillation: model2vec distillation with 512-dim PCA
  4. Tokenlearn pre-training: Contrastive loss training on 2M C4 sentences using tokenlearn

Benchmark Results (Full MTEB English Suite)

Model STS Classification PairClassification Avg Size
potion-mxbai-2m-512d (this) 74.15 65.44 76.80 72.13 ~125MB
potion-mxbai-256d-v2 71.92 63.05 73.99 69.65 7.2MB (int8)
potion-mxbai-128d-v2 70.81 60.62 72.46 67.97 3.6MB (int8)

Evaluated on 25 tasks (10 STS, 12 Classification, 3 PairClassification), English subsets only.

Model Family

Model Avg Size Best for
potion-mxbai-2m-512d 72.13 ~125MB Maximum quality
potion-mxbai-256d-v2 69.65 7.2MB (int8) Best quality/size balance
potion-mxbai-128d-v2 67.97 3.6MB (int8) Extreme size constraints

Usage

from model2vec import StaticModel

model = StaticModel.from_pretrained("blobbybob/potion-mxbai-2m-512d")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])

With Sentence Transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("blobbybob/potion-mxbai-2m-512d")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])

Training Details

  • Featurization: 2M C4 sentences encoded by mxbai-embed-large-v1
  • Training: Tokenlearn contrastive loss, batch size 256
  • Total cost: ~$3-4 on Modal

Citation

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}
Downloads last month
344
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for blobbybob/potion-mxbai-2m-512d

Finetuned
(56)
this model

Evaluation results

  • spearman_cosine on MTEB STS (English, 10 tasks)
    self-reported
    74.150
  • accuracy on MTEB Classification (English, 12 tasks)
    self-reported
    65.440
  • ap on MTEB PairClassification (English, 3 tasks)
    self-reported
    76.800