endless response

#5
by ramidahbash - opened

I have tried this AWQ version.
I deployed it using vllm 0.10.2 and 4 H100 GPUs and the response never ends, it looks like he in a conversation with itself so the response is the a question to himself and he answer it in a never ending loop.
Setting the temperature to 1.0 doesn't help.

QuantTrio org

for example?

even with "hello" prompt it responds:
hello how are you
hello again
i see the user said hello again

and it keeps going without stopping, he make a conversation from his own answers inside the output.

QuantTrio org

This repo is tested with Hopper devices.
Cases like "hello" is also included in the tests.

It's highly likely that your vllm environment is either damaged or not properly installed...

Sign up or log in to comment