potion-mxbai-128d-v2

An ultra-compact static embedding model at just 3.9MB โ€” 16x smaller than the 512D baseline while retaining 98% of its quality.

Highlights

  • 69.83 avg on full MTEB English (STS + Classification + PairClassification, 25 tasks)
  • 3.9MB with int8 quantization (16x smaller than 512D baseline)
  • 80-88x faster than all-MiniLM-L6-v2 on CPU (~18K vs ~200 sentences/sec)
  • Pure numpy inference โ€” no GPU needed
  • Native int8 support via model2vec v0.7 โ€” zero quality loss

How It Was Made

  1. Teacher: mixedbread-ai/mxbai-embed-large-v1 (335M params, BERT-large architecture)
  2. Distillation: model2vec distillation with 256-dim PCA and corpus-informed vocabulary
  3. Tokenlearn pre-training: Contrastive loss training on ~217K C4 English sentences using tokenlearn
  4. Born-again self-distillation: A second round of contrastive training using the model's own sentence embeddings as targets, closing the teacher-student representation gap (+0.49 avg)
  5. PCA to 128D: The 256D born-again embeddings are PCA-reduced to 128 dimensions, retaining 77% of variance while halving the size. This outperforms training natively at 128D because the 256D born-again model captures richer structure.

Benchmark Results (Full MTEB English Suite)

Model STS Classification PairClassification Avg Size (int8)
potion-mxbai-2m-512d 74.15 65.44 76.80 72.13 ~125MB
potion-mxbai-256d-v2 73.79 63.23 77.33 71.45 7.5MB
potion-mxbai-128d-v2 (this) 72.56 61.48 75.45 69.83 3.9MB

Evaluated on 25 tasks (10 STS, 12 Classification, 3 PairClassification), English subsets only, identical eval code across all models.

Usage

from model2vec import StaticModel

# INT8 quantized (3.9MB)
model = StaticModel.from_pretrained("blobbybob/potion-mxbai-128d-v2")

embeddings = model.encode(["Hello world", "Static embeddings are fast"])

With Sentence Transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("blobbybob/potion-mxbai-128d-v2")
embeddings = model.encode(["Hello world", "Static embeddings are fast"])

When to use this model

  • You need the smallest possible embedding model that still works well
  • Deploying on mobile, IoT, or edge devices with strict memory limits
  • Embedding millions of documents where storage cost matters (3.9MB vs 100MB+)
  • You need instant loading โ€” the entire model fits in L2 cache

Model Family

Model Avg Size (int8) Best for
potion-mxbai-2m-512d 72.13 ~125MB Maximum quality
potion-mxbai-256d-v2 71.45 7.5MB Best quality/size balance
potion-mxbai-128d-v2 69.83 3.9MB Compact deployments
potion-mxbai-micro 68.91 0.7MB Ultra-tiny / embedded

Training Details

  • Featurization: ~217K C4 sentences encoded by mxbai-embed-large-v1
  • Training: Tokenlearn contrastive loss + born-again self-distillation, batch size 256
  • Vocabulary: 29,524 tokens (corpus-informed vocabulary from mxbai teacher tokenizer)
  • Dimensions: 128 (PCA from 256D born-again model)
  • Compute: Local RTX 2070

Citation

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}
Downloads last month
657
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for blobbybob/potion-mxbai-128d-v2

Finetuned
(56)
this model

Evaluation results

  • spearman_cosine on MTEB STS (English, 10 tasks)
    self-reported
    72.560
  • accuracy on MTEB Classification (English, 12 tasks)
    self-reported
    61.480
  • ap on MTEB PairClassification (English, 3 tasks)
    self-reported
    75.450