Expected a one dimensional embeddings vector, got a multi-dimensional value

#11

by pmishra - opened Jan 13, 2024

Jan 13, 2024

I am trying to get embedding vectors for a an array of strings. I expected the output of one string to be of the dimension of 1024, but i got an array of the dimension (1,n,1024) where n varies amongst different strings of the array. Can someone explain this behaviour?

SeanLee97

WhereIsAI org Jan 14, 2024

Which way did you use it? Could you attach your code here?

I guess you use the transormers' way. The n represents the padding sequence length.

pmishra

Jan 14, 2024

Sure. I iused hugging face inference for the model embeddings. Here const embedding is the Hfinference instance-

SeanLee97

WhereIsAI org Jan 14, 2024

@pmishra hi, the obtained embeddings with shape (1, n, 1024) are all tokens' embeddings.

You can use the first token's (i.e., the CLS token) embedding as the sentence embedding, as follows:

vecs = val[:, 0, :]

pmishra

Jan 14, 2024

Thank you for the response!
Is there a difference in extent of information captured by the [CLS] token and rest of the sentence tokens? Will this be enough to carry out vector search?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment