Question on some benchmark results.

#32
by qsstcl - opened

Thank you authors! When I evaluate Kimi-VL-A3B-Instruct with lmms-eval, I found some unmatched benchmark with the official result. I set the temperture to 0.2 as suggested.
Mine Official
mmmu 52.1 57.0
mmstar 48.9 61.3

while some other benchmarks like infoVQA & MMBench-EN are approximately the same as official results. Could you please check the result to confirm the reason?

When I check the log of mmstar evaluation, I found that even the input is in English, many outputs are still in Chinese. Could it be the reason that result mismatch?
image

qsstcl changed discussion status to closed

Sign up or log in to comment