Model degrades after ~64000 tokens
#30
by ceoofcapybaras - opened
I'm noticing significantly worse quality at ~70k tokens and more frequent looping and full de-railing at ~95k tokens, when using MXFP4 version with llama.cpp. Can someone help me with the correct scaling parameters / ROPE or similar that needs to be set in llama.cpp?