Are any of your pretrained models available for commercial use?

#25

by sjonnal3 - opened Jun 29, 2025

Jun 29, 2025

Most of the models in https://www.sbert.net/docs/sentence_transformer/pretrained_models.html appear to be trained on MS Marco. My understanding is that any model that uses that dataset is not able to be used commercially. So, I am confused why for example https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 is listed as Apache v2.0, when its training data includes MS Marco.

From reading qwen3 paper (Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models), I was hopeful because you mention their training data is synthetic and they reference Apache v2 models in their abstract. However, table 6 lists MS Marco as one of their training dataset.

In any case, do you know of pretrained models from anyone else that can be used commercially?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment