Issue running model with Sentence Transformers
I keep getting,
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
when using this model. I've tried everything & it just doesn't seem to work. I'm on the latest version of all libraries.
We have not verified compatibility across all dependencies with the latest versions. It is strongly recommended to use the versions specified in the README to ensure optimal performance and stability.
Requirements
Python: 3.10.12
Sentence Transformers: 3.4.1
Transformers: 4.51.1
PyTorch: 2.7.1
Accelerate: 1.3.0
Datasets: 3.2.0
Tokenizers: 0.21.2
mteb: 1.38.30
vllm: 0.10.1.1
Thank you for responding. I will try these versions.
Hi,..sorry,..just tried rerunning my code with all of these versions, but still no luck. It runs fine with Qwen3 and other models, but this one just doesn't seem to work. I keep getting the above error.
Please paste your code here, also with the error output message if possible, thanks.π
Thanks for responding. So this is what I've been running,
model = SentenceTransformer("Kingsoft-LLM/QZhou-Embedding",
model_kwargs={"device_map": "auto",
"attn_implementation": "flash_attention_2",
"torch_dtype": "auto"},
tokenizer_kwargs={"padding_side": "left"},
trust_remote_code=True)
embedding_options = {"show_progress_bar": True, "convert_to_tensor": True, "batch_size": batch_size}
embeddings = model.encode(texts, **embedding_options)
I examined the list of texts and nothing seemed off. I toggled flash attention off and datatype, but that also didn't help. The error message is the same as above.
The code you provided ran successfully on our machine and produced the expected results, so it is unlikely to be a troublesome issue. The error message you shared earlier did not include the Traceback information β providing that would help us pinpoint which dependency library is causing the error. We suspect it may be due to version compatibility issues. πππ
Thanks for following up again. This is the entire stack trace,
Traceback (most recent call last):
embeddings = model.encode_document(texts, **embedding_options)
File "/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 688, in encode_document
return self.encode(
File "/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 1094, in encode
out_features = self.forward(features, **kwargs)
File "/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 1175, in forward
input = module(input, **module_kwargs)
File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py", line 261, in forward
outputs = self.auto_model(**trans_features, **kwargs, return_dict=True)
File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "/gpuhome/sks6765/.cache/huggingface/modules/transformers_modules/Kingsoft-LLM/QZhou-Embedding/95b9d058e21b8d520bcbb1dd5df6765e3520ff5e/modeling_qzhou.py", line 765, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 192, in forward
return F.embedding(
File "/lib/python3.9/site-packages/torch/nn/functional.py", line 2546, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
I changed encode_documents to encode but the result is the same.
Apologies for the delayed response. We've identified that the issue is related to Torch. Could you please check the version of Torch in your environment by running:
pip show torch
Our model has been tested with Torch versions 2.7.1 and 2.4.1. If issues persist, try initializing via Transformers and apply Int type conversion to token ids after tokenization. We are available to provide further support and analysis.
This is an issue of input_ids not matching the model's expected input type, which is unrelated to the content of encode_documents.πΌπΌπΌ
Hey, thank you so much for your assistance. Yes, I can confirm now that by loading the model directly in Transformers instead of sentence_transformers, it works out fine. Of course, it's a bit more work to wrap everything nicely in batch mode, but that's not too big of an issue. Just confirming, I'm using the latest versions of all libraries & I think your model is fine, but just doesn't play well with sentence_transformers.
Also, while the model does support flash_attention, performance is honestly much better when using sdpa. But, it comes at the cost of speed π₯
Indeed, the versions between libraries such as sentence transformers and PyTorch or transformers can significantly impact the correct loading of models. This often depends on factors like the environment or machine setup. We are delighted to see that our collaboration has helped you successfully run the model.
Regarding the attention mode selection, SDPA does perform slightly better, which may be attributed to the fact that the model was trained and loaded using SDPA by default. If conditions permit on your end, it would be beneficial to conduct further fine-tuning on your business data before deployment. We believe this will lead to improved results.
Thank you so much for all your help and congratulations on the great job for creating this model!