Should `rope_scaling.beta_fast` be `1.0`?

#6
by jukofyork - opened

The previous K2 models have used 1.0 for this value and it appears to have been a copy and paste bug from the deepseek3 architecture in the past:

https://github.com/radixark/miles/issues/335

Does v2.5 use 32.0 or is this a config bug that needs fixing?

Moonshot AI org

K2.5 uses rope_scaling.beta_fast=32.0 :)

K2.5 uses rope_scaling.beta_fast=32.0 :)

Oh thanks! Will close now.

jukofyork changed discussion status to closed

I've been investigating some more and it seems the original Kimi models may have used the wrong value for rope_scaling.beta_fast=1.0 as the paper says:

470498204-baf67f5d-c59d-4dd6-84da-8b1920861643~2

but when I investigated why this doesn't seem to effect llama.cpp:

https://github.com/ggml-org/llama.cpp/blob/83bcdf7217dc06ac67ff5f7322bdd89f46664c04/src/llama-cparams.h#L22

Not sure if this comment is out of date, but it looks like it is actually using a fixed value of 32.0.

It seems the same is true for ik_llama.cpp too:

https://github.com/ikawrakow/ik_llama.cpp/blob/811f8c339373b0d5f9c30d7021fe3d66629b2750/src/llama-cparams.h#L22

jukofyork changed discussion status to open

@bigeagle can you confirm if the older Kimi models should really have used rope_scaling.beta_fast=1.0 or if it was a config bug and should have been rope_scaling.beta_fast=32.0?

Sign up or log in to comment