PeterKruger's picture
Add multi-leaderboard support with navigation, enhanced metrics, and correlations
df35d1a
model_name,coding,creative writing,current news,general culture,grammar,history,logics,math,science,technology,Average (All Topics)
claude-3.5-haiku,17.2096,9.5021,13.1592,11.2295,9.2722,13.7099,7.9069,9.5013,10.8395,11.2754,11.51902452
claude-opus-4-1,97.988,31.7501,44.9092,36.3513,30.166,53.8647,38.4643,32.1546,48.1318,52.2831,48.62490598
claude-sonnet-4,75.4533,18.119,34.977,23.5653,18.9687,35.1883,19.9602,24.1877,36.5889,34.5893,33.66639032
deepSeek-R1-0528,238.2024,38.1537,62.9681,48.2227,51.7076,61.9942,302.2976,271.5234,62.5406,66.1298,119.174235
deepSeek-V3-0324,55.0977,17.8165,32.0701,23.041,24.2812,31.001,88.9007,51.2641,29.3776,43.4225,40.30336432
gemini-2.5-flash,95.8841,16.7289,32.1003,18.3471,24.8943,29.1208,133.8567,52.6201,31.5185,36.1925,48.7078753
gemini-2.5-flash-lite,26.1777,2.8563,5.6249,3.6215,3.9845,6.8374,86.5258,41.5112,6.4054,9.5453,19.15509939
gemini-2.5-pro,83.393,29.2572,49.1594,36.6978,36.9932,47.2089,166.9207,93.4631,49.4694,52.5929,65.03115036
gemma-3-27b-it,62.2708,12.4686,27.5139,18.3155,19.2946,25.0176,24.3704,40.4719,35.3548,23.2773,29.7215
GLM-4.5,147.0394,20.9164,43.4854,31.8055,38.4103,42.5121,224.8967,165.5218,43.1608,49.7589,80.74437254
GLM-4.5-Air,105.9142,13.7172,31.2173,19.5371,45.6936,30.9208,206.3031,140.9465,38.1786,44.156,68.34050587
gpt-4.1,58.0164,11.4923,24.717,14.4706,15.9619,23.1579,80.4132,46.8127,21.9879,23.3983,32.86274006
gpt-5,120.9672,50.0373,78.8355,55.0956,56.6508,73.3966,156.5955,151.2006,83.2029,76.5455,89.99818067
gpt-5-mini,102.5153,25.7197,48.4212,31.5236,35.4217,44.406,159.7168,84.2748,52.1692,56.7156,65.89701176
gpt-5-nano,98.4218,38.2174,52.23,38.2242,48.2844,54.4789,136.6573,86.4832,48.3263,53.912,66.4959839
gpt-oss-120b,49.9796,11.3648,28.9885,18.2911,14.4024,22.6241,25.4515,36.6005,27.8547,28.8135,27.00733404
grok-3-mini,45.7707,12.9635,17.7188,13.6096,20.4385,20.4502,40.5232,32.54,25.8136,23.9,26.12147499
grok-4,92.6961,28.4581,49.7663,33.7181,41.4523,48.4394,138.4335,124.3858,48.7183,50.103,60.95525411
Kimi-K2-Instruct,69.4739,29.2635,65.4032,45.9564,46.6415,75.2439,114.3645,58.6696,72.5513,57.1711,65.0222057
llama-3_1-Nemotron-Ultra-253B-v1,88.5095,28.8905,29.2174,22.7454,32.2121,36.1473,174.681,134.8929,38.2931,32.3295,61.53657957
Llama-3_3-Nemotron-Super-49B-v1,46.3574,15.1811,28.7779,22.9057,21.5446,27.2551,67.4261,33.0439,31.8722,25.0215,32.63831081
llama-4-maverick,21.5707,4.76,7.8189,5.8389,6.1681,8.3187,21.0604,11.3275,8.7158,6.8338,10.65014104
llama-4-Scout-17B-16E-Instruct,20.2177,6.4339,9.1924,7.529,7.9851,9.8133,11.1171,13.0721,12.2545,8.2341,10.86684261
magistral-small-2506,11.4551,7.1952,7.5178,5.7532,6.1051,8.6988,79.2722,37.9617,7.0901,7.2786,17.53939687
mistral-large-2411,51.7739,14.2815,23.6025,17.3517,13.3736,25.8432,18.1355,24.6234,25.1484,21.9005,24.36368715
nova-lite-v1,7.1014,4.7882,5.846,4.7061,4.3402,5.5093,4.8806,4.861,4.9134,5.3275,5.288625128
nova-pro-v1,12.4833,7.5838,7.52,5.6658,6.5254,7.3418,5.8645,7.2712,6.6838,6.7792,7.528192069
o3,70.4202,25.9427,46.4619,29.613,26.5293,42.6644,194.8085,112.9362,41.2548,46.7826,63.89621339
o4-mini,56.98,16.3976,26.7274,19.8134,21.7084,23.2641,116.3436,41.5349,28.6513,26.4233,39.05469579
phi-4,10.8373,5.9498,6.7808,6.3085,5.9981,7.1457,7.7569,12.1669,7.0431,7.4096,7.744667446
Qwen3-14B,67.7342,19.9239,31.3204,32.178,32.2363,31.2024,197.4205,132.5492,40.1656,31.5221,61.11544056
Qwen3-235B-A22B-Thinking-2507,180.1429,33.7386,65.2237,45.3004,54.6603,53.7109,122.6611,138.2941,60.427,72.9058,78.79346155
Qwen3-30B-A3B,119.9895,27.907,34.7837,25.8461,38.8109,37.0577,204.2344,157.6709,38.1969,41.6341,72.64171253