Should `rope_scaling.beta_fast` be `1.0`?
#6
by
jukofyork
- opened
The previous K2 models have used 1.0 for this value and it appears to have been a copy and paste bug from the deepseek3 architecture in the past:
https://github.com/radixark/miles/issues/335
Does v2.5 use 32.0 or is this a config bug that needs fixing?
K2.5 uses rope_scaling.beta_fast=32.0 :)
K2.5 uses
rope_scaling.beta_fast=32.0:)
Oh thanks! Will close now.
jukofyork
changed discussion status to
closed
I've been investigating some more and it seems the original Kimi models may have used the wrong value for rope_scaling.beta_fast=1.0 as the paper says:
but when I investigated why this doesn't seem to effect llama.cpp:
Not sure if this comment is out of date, but it looks like it is actually using a fixed value of 32.0.
It seems the same is true for ik_llama.cpp too:
jukofyork
changed discussion status to
open
