Has all-MiniLM-L6-v2 ever changed it's architecture? I have two different models

#49

by rcland12 - opened Feb 20, 2024

Feb 20, 2024

I have two models of all-MiniLM-L6-v2 in ONNX format that I have converted to TensorRT format and been using for a while.
One model has the inputs:

input_ids
attention_mask
Outputs:
token_embeddings
sentence_embedding

The other model has inputs:

input_ids
token_type_ids
attention_mask
Outputs:
last_hidden_state

Can someone explain this, or has anyone run into this?

tomaarsen

Sentence Transformers org Feb 21, 2024

Hello!

No, it hasn't changed its architecture. However, you can load this model in 2 ways: via Sentence Transformers where your inputs are input_ids & attention_mask and the outputs are token_embeddings & sentence_embedding, and via transformers where your inputs are input_ids, token_type_ids, attention_mask and the output is last_hidden_state. See also the model card for a bit more info on this.

Tom Aarsen

luciancap001

Jun 11, 2024

I'm loading the model via transformers using the AutoModel.from_pretrained() method and providing inputs of input_ids & attention_mask from the tokenizer (similarly loaded with AutoTokenizer) however the model is outputting last_hidden_state & pooler_output. Previously when I loaded the model in this way I was using the inputs & outputs described by @tomaarsen with AutoModel without issue, however loading the model via the SentenceTransformer class only outputs the pooled embeddings. Has something changed with the API or the model or am I doing something wrong?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment