Issue running model with Sentence Transformers

by Saptarshi7 - opened Sep 21, 2025

Sep 21, 2025

I keep getting,

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

when using this model. I've tried everything & it just doesn't seem to work. I'm on the latest version of all libraries.

YuPeng0214

Kingsoft AI org Sep 22, 2025

We have not verified compatibility across all dependencies with the latest versions. It is strongly recommended to use the versions specified in the README to ensure optimal performance and stability.

YuPeng0214

Kingsoft AI org Sep 22, 2025

Requirements
Python: 3.10.12
Sentence Transformers: 3.4.1
Transformers: 4.51.1
PyTorch: 2.7.1
Accelerate: 1.3.0
Datasets: 3.2.0
Tokenizers: 0.21.2
mteb: 1.38.30
vllm: 0.10.1.1

Saptarshi7

Sep 22, 2025

Thank you for responding. I will try these versions.

Saptarshi7

Sep 22, 2025

Hi,..sorry,..just tried rerunning my code with all of these versions, but still no luck. It runs fine with Qwen3 and other models, but this one just doesn't seem to work. I keep getting the above error.

YuPeng0214

Kingsoft AI org Sep 23, 2025

•

edited Sep 23, 2025

Please paste your code here, also with the error output message if possible, thanks.😊

Saptarshi7

Sep 23, 2025

•

edited Sep 23, 2025

Thanks for responding. So this is what I've been running,

model = SentenceTransformer("Kingsoft-LLM/QZhou-Embedding",
                                        model_kwargs={"device_map": "auto",
                                                      "attn_implementation": "flash_attention_2",
                                                      "torch_dtype": "auto"},
                                        tokenizer_kwargs={"padding_side": "left"},
                                        trust_remote_code=True)

embedding_options = {"show_progress_bar": True, "convert_to_tensor": True, "batch_size": batch_size}

embeddings = model.encode(texts, **embedding_options)

I examined the list of texts and nothing seemed off. I toggled flash attention off and datatype, but that also didn't help. The error message is the same as above.

YuPeng0214

Kingsoft AI org Sep 23, 2025

The code you provided ran successfully on our machine and produced the expected results, so it is unlikely to be a troublesome issue. The error message you shared earlier did not include the Traceback information — providing that would help us pinpoint which dependency library is causing the error. We suspect it may be due to version compatibility issues. 😁😁😁

Saptarshi7

Sep 23, 2025

Thanks for following up again. This is the entire stack trace,

Traceback (most recent call last):
    embeddings = model.encode_document(texts, **embedding_options)
  File "/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 688, in encode_document
    return self.encode(
  File "/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 1094, in encode
    out_features = self.forward(features, **kwargs)
  File "/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 1175, in forward
    input = module(input, **module_kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py", line 261, in forward
    outputs = self.auto_model(**trans_features, **kwargs, return_dict=True)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/gpuhome/sks6765/.cache/huggingface/modules/transformers_modules/Kingsoft-LLM/QZhou-Embedding/95b9d058e21b8d520bcbb1dd5df6765e3520ff5e/modeling_qzhou.py", line 765, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 192, in forward
    return F.embedding(
  File "/lib/python3.9/site-packages/torch/nn/functional.py", line 2546, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

I changed encode_documents to encode but the result is the same.

YuPeng0214

Kingsoft AI org Sep 24, 2025

•

edited Sep 24, 2025

Apologies for the delayed response. We've identified that the issue is related to Torch. Could you please check the version of Torch in your environment by running:

pip show torch

Our model has been tested with Torch versions 2.7.1 and 2.4.1. If issues persist, try initializing via Transformers and apply Int type conversion to token ids after tokenization. We are available to provide further support and analysis.

YuPeng0214

Kingsoft AI org Sep 24, 2025

This is an issue of input_ids not matching the model's expected input type, which is unrelated to the content of encode_documents.😼😼😼

Saptarshi7

Sep 24, 2025

•

edited Sep 24, 2025

Hey, thank you so much for your assistance. Yes, I can confirm now that by loading the model directly in Transformers instead of sentence_transformers, it works out fine. Of course, it's a bit more work to wrap everything nicely in batch mode, but that's not too big of an issue. Just confirming, I'm using the latest versions of all libraries & I think your model is fine, but just doesn't play well with sentence_transformers.

Also, while the model does support flash_attention, performance is honestly much better when using sdpa. But, it comes at the cost of speed 😥

YuPeng0214

Kingsoft AI org Sep 25, 2025

Indeed, the versions between libraries such as sentence transformers and PyTorch or transformers can significantly impact the correct loading of models. This often depends on factors like the environment or machine setup. We are delighted to see that our collaboration has helped you successfully run the model.

Regarding the attention mode selection, SDPA does perform slightly better, which may be attributed to the fact that the model was trained and loaded using SDPA by default. If conditions permit on your end, it would be beneficial to conduct further fine-tuning on your business data before deployment. We believe this will lead to improved results.

Saptarshi7

Sep 25, 2025

Thank you so much for all your help and congratulations on the great job for creating this model!

Saptarshi7 changed discussion status to closed Sep 25, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment