Does Kimi2.5 only score 72.57 on tau2-bench?
#87
by
Alicia-Ross
- opened
I noticed that kimi-k2.5 scored very low on tau2-bench, only 72.5 from https://huggingface.co/inclusionAI/Ring-2.5-1T , but I saw scores of 80+ on GLM5(80.2 https://huggingface.co/zai-org/GLM-5) and Step-3.5-Flash(85.4 https://huggingface.co/stepfun-ai/Step-3.5-Flash). I wonder if there's something wrong with their testing?