Independent evaluation results

#78
by yaronr - opened

Dear THUDM team,

I'm pleased to share our independent evaluation of the model using our implementation of the MMLU-Pro benchmark.

I hope you find this useful.

Great discussion! For anyone wanting to quickly test this, Crazyrouter offers API access to this model. No infrastructure setup needed — just an API key and the standard OpenAI SDK.

Sign up or log in to comment