Issue running model with Sentence Transformers

#3
by Saptarshi7 - opened

I keep getting,

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

when using this model. I've tried everything & it just doesn't seem to work. I'm on the latest version of all libraries.

Kingsoft AI org

We have not verified compatibility across all dependencies with the latest versions. It is strongly recommended to use the versions specified in the README to ensure optimal performance and stability.

Kingsoft AI org

Requirements
Python: 3.10.12
Sentence Transformers: 3.4.1
Transformers: 4.51.1
PyTorch: 2.7.1
Accelerate: 1.3.0
Datasets: 3.2.0
Tokenizers: 0.21.2
mteb: 1.38.30
vllm: 0.10.1.1

Thank you for responding. I will try these versions.

Hi,..sorry,..just tried rerunning my code with all of these versions, but still no luck. It runs fine with Qwen3 and other models, but this one just doesn't seem to work. I keep getting the above error.

Kingsoft AI org
β€’
edited Sep 23

Please paste your code here, also with the error output message if possible, thanks.😊

Thanks for responding. So this is what I've been running,

model = SentenceTransformer("Kingsoft-LLM/QZhou-Embedding",
                                        model_kwargs={"device_map": "auto",
                                                      "attn_implementation": "flash_attention_2",
                                                      "torch_dtype": "auto"},
                                        tokenizer_kwargs={"padding_side": "left"},
                                        trust_remote_code=True)

embedding_options = {"show_progress_bar": True, "convert_to_tensor": True, "batch_size": batch_size}

embeddings = model.encode(texts, **embedding_options)

I examined the list of texts and nothing seemed off. I toggled flash attention off and datatype, but that also didn't help. The error message is the same as above.

Kingsoft AI org

The code you provided ran successfully on our machine and produced the expected results, so it is unlikely to be a troublesome issue. The error message you shared earlier did not include the Traceback information β€” providing that would help us pinpoint which dependency library is causing the error. We suspect it may be due to version compatibility issues. 😁😁😁

Thanks for following up again. This is the entire stack trace,

Traceback (most recent call last):
    embeddings = model.encode_document(texts, **embedding_options)
  File "/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 688, in encode_document
    return self.encode(
  File "/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 1094, in encode
    out_features = self.forward(features, **kwargs)
  File "/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 1175, in forward
    input = module(input, **module_kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py", line 261, in forward
    outputs = self.auto_model(**trans_features, **kwargs, return_dict=True)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/gpuhome/sks6765/.cache/huggingface/modules/transformers_modules/Kingsoft-LLM/QZhou-Embedding/95b9d058e21b8d520bcbb1dd5df6765e3520ff5e/modeling_qzhou.py", line 765, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 192, in forward
    return F.embedding(
  File "/lib/python3.9/site-packages/torch/nn/functional.py", line 2546, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

I changed encode_documents to encode but the result is the same.

Kingsoft AI org
β€’
edited Sep 24

Apologies for the delayed response. We've identified that the issue is related to Torch. Could you please check the version of Torch in your environment by running:

pip show torch

Our model has been tested with Torch versions 2.7.1 and 2.4.1. If issues persist, try initializing via Transformers and apply Int type conversion to token ids after tokenization. We are available to provide further support and analysis.

Kingsoft AI org

This is an issue of input_ids not matching the model's expected input type, which is unrelated to the content of encode_documents.😼😼😼

Hey, thank you so much for your assistance. Yes, I can confirm now that by loading the model directly in Transformers instead of sentence_transformers, it works out fine. Of course, it's a bit more work to wrap everything nicely in batch mode, but that's not too big of an issue. Just confirming, I'm using the latest versions of all libraries & I think your model is fine, but just doesn't play well with sentence_transformers.

Also, while the model does support flash_attention, performance is honestly much better when using sdpa. But, it comes at the cost of speed πŸ˜₯

Kingsoft AI org

Indeed, the versions between libraries such as sentence transformers and PyTorch or transformers can significantly impact the correct loading of models. This often depends on factors like the environment or machine setup. We are delighted to see that our collaboration has helped you successfully run the model.

Regarding the attention mode selection, SDPA does perform slightly better, which may be attributed to the fact that the model was trained and loaded using SDPA by default. If conditions permit on your end, it would be beneficial to conduct further fine-tuning on your business data before deployment. We believe this will lead to improved results.

Thank you so much for all your help and congratulations on the great job for creating this model!

Saptarshi7 changed discussion status to closed

Sign up or log in to comment