Model unstable

by rageltman - opened Jan 12

Jan 12

Seems that the 235 did not take as well to the treatment as the 30B. Seeing severe deviation and wandering, repetition, and token garbage in the output at suggested settings. Temp down to 0.65 it just loops analysis of a small code block. The upstream qwen235 works great at the same exact settings.

Using sampling from generation_config: temp=Some(0.65), top_k=Some(20), top_p=Some(0.95), freq_penalty=Some(1.05), pres_penalty=Some(1.2)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment