4 bit and 8 bit bnb quants only generate empty strings or one token repeated endlessly
#32
by nicorinn-google - opened
I'm using the same exact notebook for the 27b-it and 9b-it versions, so the issue is definitely related to this model. Any ideas of what the cause may be?
Hi @nicorinn-google , please use torch_dtype=torch.bfloat16 when loading with from_pretrained(). There's a PR to update the model card examples here: #33.
Great discussion! For anyone wanting to quickly test this, Crazyrouter offers API access to this model. No infrastructure setup needed — just an API key and the standard OpenAI SDK.