Instructions to use WhereIsAI/UAE-Large-V1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use WhereIsAI/UAE-Large-V1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("WhereIsAI/UAE-Large-V1") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use WhereIsAI/UAE-Large-V1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="WhereIsAI/UAE-Large-V1")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("WhereIsAI/UAE-Large-V1") model = AutoModel.from_pretrained("WhereIsAI/UAE-Large-V1") - Transformers.js
How to use WhereIsAI/UAE-Large-V1 with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('feature-extraction', 'WhereIsAI/UAE-Large-V1'); - Notebooks
- Google Colab
- Kaggle
Expected a one dimensional embeddings vector, got a multi-dimensional value
I am trying to get embedding vectors for a an array of strings. I expected the output of one string to be of the dimension of 1024, but i got an array of the dimension (1,n,1024) where n varies amongst different strings of the array. Can someone explain this behaviour?
Which way did you use it? Could you attach your code here?
I guess you use the transormers' way. The n represents the padding sequence length.
Thank you for the response!
Is there a difference in extent of information captured by the [CLS] token and rest of the sentence tokens? Will this be enough to carry out vector search?
