Independent evaluation results
#78
by yaronr - opened
Dear THUDM team,
I'm pleased to share our independent evaluation of the model using our implementation of the MMLU-Pro benchmark.
I hope you find this useful.
Great discussion! For anyone wanting to quickly test this, Crazyrouter offers API access to this model. No infrastructure setup needed — just an API key and the standard OpenAI SDK.