Sentence Similarity
sentence-transformers
PyTorch
ONNX
Safetensors
OpenVINO
Transformers
English
mpnet
fill-mask
feature-extraction
text-embeddings-inference
Instructions to use sentence-transformers/all-mpnet-base-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use sentence-transformers/all-mpnet-base-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("sentence-transformers/all-mpnet-base-v1") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use sentence-transformers/all-mpnet-base-v1 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-mpnet-base-v1") model = AutoModelForMaskedLM.from_pretrained("sentence-transformers/all-mpnet-base-v1") - Inference
- Notebooks
- Google Colab
- Kaggle
Max Input Length Documentation
#1
by sondalex - opened
Hi, the repository README mentions:
By default, input text longer than 128 word pieces is truncated.
However, the parameter max_seq_length from sentence_transformers returns 512.
from sentence_transformers import SentenceTransformer
model_st = SentenceTransformer('all-mpnet-base-v1')
model_st.max_seq_length
# 512
Same value is returned for the Hugging face transformer approach:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-mpnet-base-v1')
tokenizer.model_max_length
# 512
Shouldn't the README be updated from 128 to 512 ?
Output of pip freeze:
...
sentence-transformers==2.2.2
huggingface-hub==0.10.1
transformers==4.23.1
torch==1.12.1
...
I have the same question! Looking to embed text up to the maximum sequence length of 512. I am assuming it won't be truncated at 128 despite what the README says?
That's a great observation, thank you for posting this