Instructions to use hkunlp/instructor-xl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use hkunlp/instructor-xl with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("hkunlp/instructor-xl") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use hkunlp/instructor-xl with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("hkunlp/instructor-xl") model = AutoModel.from_pretrained("hkunlp/instructor-xl") - Notebooks
- Google Colab
- Kaggle
config.json for pooling is incorrect
the config JSON for pooling includes arguments that are not valid for the Pooling function.
The following are in the config (not in this order):
"word_embedding_dimension": 768,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": true,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false,
"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": false
The Pooling function only accepts the top 5 arguments.
Model will not instantiate without removing the bottom two keys from the config.
I'm cloning the repo and using:
model = SentenceTranformer("local_path")
Hi, Thanks a lot for your interest in INSTRUCTOR!
As we have overwritten several classes of sentence transformer library, you may need to install the InstructorEmbedding package following instructions at https://github.com/HKUNLP/instructor-embedding#installation.
After that, you can use our INSTRUCTOR model as
from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large')
Feel free to add any further questions or comments!
No issues using your recommended method. I was also able to get the cloning method to work by removing the unaccepted keys. Are there any negative consequences to removing the following keys from the config?
"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": false
Its working great for my embedding task. Just curious about this.
Hi, thanks a lot for your comments!
By removing unnecessary keys and using the SentenceTranformer library, it seems that you will not be able to add instructions for embedding calculation.