Model degrades after ~64000 tokens

#30

by ceoofcapybaras - opened Feb 18

Feb 18

I'm noticing significantly worse quality at ~70k tokens and more frequent looping and full de-railing at ~95k tokens, when using MXFP4 version with llama.cpp. Can someone help me with the correct scaling parameters / ROPE or similar that needs to be set in llama.cpp?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment