Answer loop
When using tabbyAPI + Open WebUI with the 4bpw version, the model get the correct answer but after finishing it it starts answering again and again. There might be a chat template problem. Do you know how to solve it?
The HF config format is very confused and model makers don't always agree on how to interpret it. So deducing things like EOS tokens gets very convoluted.
While EXL3 seems to get it right in this case, the token ID list wasn't being propagated correctly to TabbyAPI. I've updated TabbyAPI so it should handle this model correctly now.
If you want a quick fix without updating Tabby, you can edit the model's config.json and copy the eos_token_id key outside the text_config section.
I updated tabbyAPI and it works flawlessly now. Thank you.