Endless response

#6
by ramidahbash - opened

I have tried this AWQ version.
I deployed it using vllm 0.10.2 and 4 H100 GPUs and the response never ends, it looks like he in a conversation with itself so the response is the a question to himself and he answer it in a never ending loop.
Setting the temperature to 1.0 doesn't help.

Sign up or log in to comment