Instructions to use Alibaba-NLP/gte-Qwen2-7B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Alibaba-NLP/gte-Qwen2-7B-instruct with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Alibaba-NLP/gte-Qwen2-7B-instruct with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Running this model without flash-attn
Hi, is it possible to run this model without flash-attn? It seems to be a requirement in your code at the moment (e.g. modeling_qwen.py).
Hi, I managed to run it without installing flash-attn on SageMaker. Looking at the code, it checks if it's installed and if it is not, it skips the import
from transformers.utils import (
add_start_docstrings,
add_start_docstrings_to_model_forward,
is_flash_attn_2_available,
is_flash_attn_greater_or_equal_2_10,
logging,
replace_return_docstrings,
)
if is_flash_attn_2_available():
from flash_attn import flash_attn_func, flash_attn_varlen_func
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
_flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)
Bumping this request: This model creates a a bad user experience - https://github.com/michaelfeil/infinity/issues/308 -> flash-attn is checked and raises.
I'd suggest to PR this remote code to the actual transformers lib for a better experience. The embedding model does not employ window_attn, hence there should be little need for FA2.