Instructions to use thenlper/gte-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use thenlper/gte-large with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("thenlper/gte-large") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Inference
- Notebooks
- Google Colab
- Kaggle
Are any of your pretrained models available for commercial use?
Most of the models in https://www.sbert.net/docs/sentence_transformer/pretrained_models.html appear to be trained on MS Marco. My understanding is that any model that uses that dataset is not able to be used commercially. So, I am confused why for example https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 is listed as Apache v2.0, when its training data includes MS Marco.
From reading qwen3 paper (Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models), I was hopeful because you mention their training data is synthetic and they reference Apache v2 models in their abstract. However, table 6 lists MS Marco as one of their training dataset.
In any case, do you know of pretrained models from anyone else that can be used commercially?