m2v-gte-multilingual-768

A static embedding model distilled from Alibaba-NLP/gte-multilingual-base using Model2Vec.

Model Description

Property Value
Dimensions 768
Vocabulary ~250,000 tokens
Base Model Alibaba-NLP/gte-multilingual-base
Distillation Method Model2Vec (PCA + SIF weighting)
Speed ~3,000+ texts/second (CPU)
Languages 70+ (inherited from GTE)

MTEB Benchmark Results

Task Language Accuracy F1
Banking77Classification EN 52.7% 51.3%
AmazonReviewsClassification DE 28.6% 27.9%

Note: These scores are typical for static embedding models. The advantage is speed (~3000 texts/s vs ~20 texts/s for transformer models).

Comparison with Other Static Models

Model Banking77 (EN)
GloVe ~35%
FastText ~40%
m2v-gte-multilingual-768 52.7%
potion-base-8M (official) ~55%

Custom Task Performance

Tested on multilabel text classification (German educational content, 44 labels):

Metric Score
F1 Macro 82.9%
F1 Micro 88.2%
Precision Macro 90.9%

Usage

With Model2Vec (recommended)

from model2vec import StaticModel

model = StaticModel.from_pretrained("JanSchachtschabel/m2v-gte-multilingual-768")
embeddings = model.encode(["Beispieltext auf Deutsch", "Example text in English"])

With Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("JanSchachtschabel/m2v-gte-multilingual-768")
embeddings = model.encode(["Beispieltext auf Deutsch"])

Installation

pip install model2vec
# or
pip install sentence-transformers

License

This model is released under the Apache 2.0 License.

Attribution

Citation

If you use this model, please cite:

@article{zhang2024mgte,
  title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval},
  author={Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan and Min, Zhang},
  journal={arXiv preprint arXiv:2407.19669},
  year={2024}
}

@article{minishlab2024model2vec,
  author = {Tulkens, Stephan and {van Dongen}, Thomas},
  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec}
}

Acknowledgments

  • Alibaba DAMO Academy for the excellent GTE multilingual model
  • Minish Lab for the Model2Vec distillation framework
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JanSchachtschabel/m2v-gte-multilingual-768

Finetuned
(95)
this model

Paper for JanSchachtschabel/m2v-gte-multilingual-768