minishlab
/

potion-multilingual-128M

sentence-transformers

static-embeddings

Model card Files Files and versions

Pringled commited on May 22, 2025

Commit

45359ff

·

verified ·

1 Parent(s): 50d4c7e

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -47,10 +47,8 @@ embeddings = model.encode(["Example sentence"])
 Model2vec creates a small, static model that outperforms other static embedding models by a large margin on all tasks on MTEB. This model is pre-trained using Tokenlearn. It's created using the following steps:
 - Distillation: first, a model is distilled from a sentence transformer model using Model2Vec.
-- Training data creation: the sentence transformer model is used to create training data by creating mean output embeddings on a large corpus.
 - Training: the distilled model is trained on the training data using Tokenlearn.
-- Post-training re-regularization: after training, the model is re-regularized by weighting the tokens based on their
-- frequency, applying PCA, and finally applying SIF weighting.
 The results for this model can be found on the [Model2Vec results page](https://github.com/MinishLab/model2vec/blob/main/results/README.md).

 Model2vec creates a small, static model that outperforms other static embedding models by a large margin on all tasks on MTEB. This model is pre-trained using Tokenlearn. It's created using the following steps:
 - Distillation: first, a model is distilled from a sentence transformer model using Model2Vec.
+- Training data creation: the sentence transformer model is used to create training data by creating mean output embeddings on a large corpus. In this case, 2 million sentences from the C4 dataset were used from 101 different languages, sampled using temperature-smoothed sampling proportional to the language size.
 - Training: the distilled model is trained on the training data using Tokenlearn.
 The results for this model can be found on the [Model2Vec results page](https://github.com/MinishLab/model2vec/blob/main/results/README.md).