PeterKruger's picture
Run of November 28, 2025
a0bc03a verified
model_name,coding,creative writing,current news,general culture,grammar,history,logics,math,science,technology,Average (All Topics),
Claude-haiku-4.5,58.7391,45.1571,33.8539,25.816,49.3062,33.4654,56.5625,39.3133,29.8127,45.016,41.7042,41.83
Claude-opus-4-1,160.936,84.6247,90.3992,92.8329,81.9117,85.4813,193.8744,174.7207,96.4217,74.0471,113.525,113.96
Claude-sonnet-4.5,89.2451,54.2669,64.1351,42.1343,54.5329,53.5087,70.5153,52.3109,43.103,53.4261,57.7178,58.05
DeepSeek-R1-0528,222.8545,45.3084,63.747,48.1999,57.561,61.3735,231.8319,225.2358,50.5751,61.3046,106.7992,107.45
Deepseek-v3.2-exp,169.0653,55.1251,56.8076,56.6134,78.9261,48.6402,215.1592,219.9205,43.0358,51.9097,99.5203,99.89
Gemini-2.5-flash,68.0633,27.6973,27.569,22.3158,33.6431,23.2926,59.9638,52.2344,47.3171,31.15,39.3246,40.03
Gemini-2.5-flash-lite,24.3631,10.261,11.3181,13.2746,12.9128,10.8387,20.9955,14.9352,12.5418,10.5412,14.1982,14.38
Gemini-2.5-pro,91.2418,38.1668,41.1249,38.6921,47.2019,41.8959,72.0911,66.6108,56.266,44.0559,53.7347,54.51
Gemini-3-pro-preview,63.677,44.6107,33.2165,30.1365,48.3047,32.3809,67.6419,63.0423,40.5995,31.455,45.5065,45.45
Gemma-3-27b-it,44.7719,50.4246,25.8422,45.4665,25.0368,32.6465,45.7737,33.2924,23.1611,34.8024,36.1218,36.21
GLM-4.6,133.1697,68.0359,61.7867,69.6313,98.0989,66.1604,184.1942,154.2589,88.9124,74.4897,99.8738,100.48
Gpt-5,196.5078,123.3507,162.4477,120.2568,140.436,153.6043,219.7226,123.2416,145.3858,125.5599,151.0513,151.90
Gpt-5.1,158.931,101.4765,119.542,88.9939,122.2953,137.4691,194.2428,158.5537,120.7889,90.0683,129.2361,129.78
Gpt-5-nano,91.1241,69.7506,53.5343,51.303,84.1889,53.8808,91.6712,77.4804,58.477,52.2229,68.3633,68.67
Gpt-oss-120b,27.4284,14.345,19.7344,14.2113,14.3535,17.0332,22.2331,13.2228,15.6451,11.4725,16.9679,17.14
Grok-3-mini,31.277,16.3344,17.4117,14.1025,20.144,14.2064,24.7938,27.0658,15.8205,16.2357,19.7392,19.90
Grok-4.1-fast,33.2591,16.3825,24.1466,18.5611,22.1003,23.044,34.6221,14.9256,23.1455,21.0239,23.1211,23.30
Grok-4.1-fast-thinking,98.0177,41.5683,32.4349,30.6874,81.1326,30.1515,134.1396,96.4524,55.0547,31.5393,63.1178,63.81
Kimi-k2-0905,72.378,28.7963,39.4441,51.217,20.7327,39.6662,44.9747,41.3452,58.0402,49.5102,44.6104,45.14
Kimi-k2-thinking,126.1738,105.8907,47.8623,41.5867,79.1883,55.1272,174.7176,107.2654,63.4829,54.6113,85.5906,85.92
Llama-3.3-70b-instruct,27.1267,17.8344,15.8348,16.7581,15.2413,21.6736,21.2501,22.0717,14.4405,10.859,18.309,18.39
Llama-3.3-nemotron-super-49b-v1.5,80.6883,26.7635,24.3885,23.7254,42.3479,24.0025,94.4969,73.9853,26.5396,21.8686,43.8807,44.39
Llama-4-maverick,17.5852,10.1771,10.6483,10.8751,16.7924,10.6112,14.4204,17.7789,11.1265,9.3529,12.9368,13.00
Magistral-medium-2506,22.3257,10.4762,14.4269,8.2297,10.1116,8.7327,41.656,40.9489,8.7825,8.6307,17.4321,17.49
Mistral-small-3.2-24b-instruct,21.76,11.1941,13.1085,8.9558,14.8763,11.242,17.1284,20.9789,13.8524,12.3487,14.5445,14.65
Nemotron-nano-9b-v2,55.4504,12.8432,10.9387,11.4131,19.706,11.8203,52.1425,45.4582,15.994,9.5441,24.5311,24.98
Nova-premier-v1,20.2938,13.4422,12.3214,12.0255,11.8058,14.1318,11.3591,10.1795,13.8182,11.1906,13.0568,13.16
Nova-pro-v1,9.3119,8.2956,7.1086,6.4015,6.0344,5.6165,4.765,4.1713,5.3871,8.9823,6.6074,6.62
Phi-4,26.3609,19.7493,17.2262,14.0146,15.2968,15.8625,15.1601,20.1686,17.2634,16.7219,17.7824,17.88
Qwen3-235b-a22b-2507,74.2713,29.1597,37.7774,23.7464,43.2758,20.5732,78.3867,80.4367,27.6402,15.5055,43.0773,43.38
Qwen3-235B-A22B-Thinking-2507,231.3204,74.9485,88.9857,68.2265,85.9875,84.8865,287.2099,264.191,115.2909,81.0289,138.2076,135.80
Qwen3-30b-a3b-instruct-2507,65.3313,20.5034,18.1601,15.8745,28.0737,19.5814,93.9773,72.204,22.0913,19.7902,37.5587,38.14