Model unstable
#3
by rageltman - opened
Seems that the 235 did not take as well to the treatment as the 30B. Seeing severe deviation and wandering, repetition, and token garbage in the output at suggested settings. Temp down to 0.65 it just loops analysis of a small code block. The upstream qwen235 works great at the same exact settings.
Using sampling from generation_config: temp=Some(0.65), top_k=Some(20), top_p=Some(0.95), freq_penalty=Some(1.05), pres_penalty=Some(1.2)