使用transformers推理时输出结果总是相同

#76

by zhusl-cpu - opened 11 days ago

当使用以下常规transformers模型推理代码时，模型对相同输入的输出总是完全相同，不论如何调整采样参数。

from transformers import AutoModelForCausalLM, AutoTokenizer

model_directory = '...'
tokenizer = AutoTokenizer.from_pretrained(model_directory)
model = AutoModelForCausalLM.from_pretrained(
    model_directory,
    device_map='cuda'
)

messages = [
    {
        "role": "user",
        "content": "你好"
    },
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
model_inputs = tokenizer([text]*10, return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=200,
    temperature=1.0,
    top_k=15,
    top_p=0.9,
)
for i,generated_id in enumerate(generated_ids):
    output_ids = generated_id[len(model_inputs.input_ids[0]):].tolist()
    content = tokenizer.decode(output_ids, skip_special_tokens=True)
    print(f"[{i}] {content}")

不论输入是什么，不论如何设置temperature、top_k、top_p这些采样参数，模型的输出结果都全部一样，Qwen3.5系列多个模型均是如此。

是transformers目前还不支持直接在代码中进行Qwen3.5的推理，还是qwen3.5需要使用和普通文本生成模型不同的代码，还是环境和包版本的问题？

输出示例：

[0] 你好！有什么我可以帮你的吗？😊

[1] 你好！有什么我可以帮你的吗？😊

[2] 你好！有什么我可以帮你的吗？😊

[3] 你好！有什么我可以帮你的吗？😊

[4] 你好！有什么我可以帮你的吗？😊

[5] 你好！有什么我可以帮你的吗？😊

[6] 你好！有什么我可以帮你的吗？😊

[7] 你好！有什么我可以帮你的吗？😊

[8] 你好！有什么我可以帮你的吗？😊

[9] 你好！有什么我可以帮你的吗？😊

我的环境：
python 3.11
cuda 12.8
torch 2.8.0+cu128
transformers 5.5.4

其他一些可能相关的包：
causal_conv1d 1.6.1
flash-linear-attention 0.4.2
flash_attn 2.8.3

zhusl-cpu changed discussion title from 使用transformers推理时表输出结果总是相同 to 使用transformers推理时输出结果总是相同 11 days ago

zhusl-cpu

11 days ago

问题已经解决。
似乎Qwen3.5的generate()不再接受直接传入采样参数，只接受通过GenerationConfig传入，且默认do_sample为False。

zhusl-cpu changed discussion status to closed 11 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment