PeterKruger's picture
Add multi-leaderboard support with navigation, enhanced metrics, and correlations
df35d1a
model_name,coding,creative writing,current news,general culture,grammar,history,logics,math,science,technology,Average (All Topics)
GLM-4.5,336.5838,46.1539,115.4247,61.5081,183.8738,94.3667,955.208,354.6674,147.6845,164.993,246.0464
GLM-4.5-Air,294.8103,49.5964,75.2865,47.3232,188.0317,122.7531,934.9063,326.1799,192.6793,173.4233,240.499
Kimi-K2-Instruct,328.1743,127.0559,498.215,155.9084,411.2174,374.8414,919.1317,342.6293,464.4898,283.0215,390.4685
Llama-3_3-Nemotron-Super-49B-v1,215.9947,28.1436,78.589,55.0476,69.6513,64.6465,672.344,82.6381,150.7472,96.6115,151.4413
Qwen3-14B,291.1851,58.0622,88.5603,117.7344,88.0704,115.429,952.7636,353.0349,203.2735,124.2363,239.235
Qwen3-235B-A22B-Thinking-2507,666.7428,62.6109,184.4234,141.1792,188.1752,130.8262,431.5208,488.4891,231.8124,312.6447,283.8425
Qwen3-30B-A3B,302.3182,64.1154,92.9827,77.8548,121.5989,100.3155,973.6302,352.3788,165.0523,180.4958,243.0743
claude-3.5-haiku,41.3124,14.9481,33.5752,18.5466,17.7297,36.6532,15.8231,40.0595,17.8959,16.9296,25.3473
claude-opus-4-1,411.8235,66.277,99.7769,67.8099,73.13,140.073,240.5076,85.4884,194.4168,172.176,155.1479
claude-sonnet-4,372.4893,48.2756,92.4428,50.91,54.3981,124.7965,53.7468,57.4402,181.6048,159.8677,119.5972
deepSeek-R1-0528,516.7523,74.2918,114.0859,72.9274,112.0078,127.0512,839.4571,432.4486,182.3988,184.1239,265.5545
deepSeek-V3-0324,186.8884,51.2986,88.1105,69.8525,65.7204,121.314,755.2438,202.0879,100.6333,355.9609,199.711
gemini-2.5-flash,712.3231,38.3328,103.7981,43.5623,74.993,117.5583,944.4005,122.8572,135.3642,147.9211,244.1111
gemini-2.5-flash-lite,190.8247,6.2927,19.3252,12.1569,13.2804,47.8915,602.4712,222.2597,49.1969,110.7355,127.4435
gemini-2.5-pro,240.7699,52.3828,97.3472,57.9532,71.3161,111.1321,714.4278,300.57,170.1939,177.3255,199.3419
gemma-3-27b-it,375.5933,26.3165,60.8123,46.1448,66.8804,99.0397,180.0912,228.1774,193.2272,68.8631,134.5146
gpt-4.1,373.0435,20.9727,103.6544,34.6418,45.1164,68.0267,580.148,268.4173,151.755,161.6432,180.7419
gpt-5,379.0229,104.1814,151.0171,109.2785,163.0493,141.3236,655.5064,536.5325,304.2283,232.5815,277.6722
gpt-5-mini,420.1856,55.5139,107.6471,65.403,94.1406,96.4831,710.367,304.143,221.4093,238.5054,231.3798
gpt-5-nano,452.6803,63.8721,123.3952,95.2713,98.3822,131.3145,649.7221,349.2375,145.5956,209.7373,231.9208
gpt-oss-120b,219.9099,40.9979,88.2543,59.7898,50.0398,66.35,154.4168,213.958,154.396,143.3909,119.1503
grok-3-mini,324.7266,28.8678,38.3573,27.4405,58.7259,56.7006,303.2076,79.7071,164.6423,78.6302,116.1006
grok-4,330.6722,73.9998,112.4368,75.3662,148.4834,118.3984,908.2874,484.3592,205.0356,168.1656,262.5205
llama-3_1-Nemotron-Ultra-253B-v1,299.2445,64.7177,80.6145,53.2406,111.7416,114.9641,677.2227,364.4893,179.9651,73.4696,201.967
llama-4-Scout-17B-16E-Instruct,119.4443,17.908,21.721,15.137,15.5893,21.4368,21.6394,35.6977,109.5036,18.1442,39.6221
llama-4-maverick,258.2904,12.3067,28.3245,14.1619,15.9541,23.4693,237.7464,50.0955,44.6007,26.4254,71.1375
magistral-small-2506,50.6671,23.6896,23.0028,17.8342,14.257,27.6318,461.2929,227.3066,22.4139,27.139,89.5235
mistral-large-2411,320.7227,28.5833,76.1094,50.5788,34.0307,104.2414,69.1314,52.9922,161.3657,71.1036,96.8859
nova-lite-v1,17.362,9.0387,11.6702,10.1896,8.0435,9.778,8.2672,7.7491,9.4719,11.1956,10.2766
nova-pro-v1,55.831,13.3815,14.2866,9.7714,24.3369,15.6141,14.2894,23.7601,15.2509,15.0352,20.1557
o3,370.6262,215.2039,126.7157,84.4048,96.7106,130.4733,970.1118,427.7559,179.8835,165.4601,276.7346
o4-mini,317.1998,49.1399,78.8689,45.2952,67.6997,52.9028,768.0834,246.0076,143.2397,86.9799,185.5417
phi-4,28.1176,10.3654,12.3853,13.4812,13.4604,12.3491,14.116,39.9468,13.6159,34.0316,19.1869