evalstate/birch-html / analysis /deep-dives /opus-performance-by-artifact.csv
evalstate's picture
download
raw
3.06 kB
suite,model,eval,generation_ok,quality,duration_s,total_tokens,input_tokens,output_tokens,effective_input_tokens,cache_tokens,tokens_per_s,turns,tool_calls,det_failures,vlm_failures,vlm_warnings
publish,opus47,benchmark-comparison,True,100.0,150.046,397948,388331,9617,26486,361845,2652.17,19,22,0,0,0
publish,opus47,code-review,True,100.0,268.356,588373,571314,17059,73388,497926,2192.51,14,18,0,0,0
publish,opus47,implementation-plan,True,100.0,141.632,215600,206186,9414,22107,184079,1522.25,11,12,0,0,0
publish,opus47,module-explainer,True,100.0,206.748,669243,653611,15632,85438,568173,3237.0,13,19,0,0,0
publish,opus47,numeric-data,True,100.0,106.088,170203,161380,8823,20969,140411,1604.36,10,12,0,0,0
new-model-day,opus46,benchmark-comparison,True,100.0,271.957,371021,351900,19121,56694,295206,1364.26,14,18,0,0,0
new-model-day,opus46,code-review,True,100.0,237.048,540085,528342,11743,40896,487446,2278.38,17,29,0,0,0
new-model-day,opus46,implementation-plan,True,94,130.271,167161,159833,7328,22835,136998,1283.18,11,12,2,0,0
new-model-day,opus46,module-explainer,True,100.0,192.786,417791,406724,11067,44687,362037,2167.12,11,18,0,0,0
new-model-day,opus46,numeric-data,True,100.0,165.446,355864,346224,9640,26534,319690,2150.94,20,21,0,0,0
new-model-day,opus48,benchmark-comparison,True,100.0,258.31,704433,685790,18643,55911,629879,2727.08,21,26,0,0,0
new-model-day,opus48,code-review,True,100.0,197.043,474233,459662,14571,72302,387360,2406.75,12,15,0,0,0
new-model-day,opus48,implementation-plan,True,100.0,196.392,264333,252260,12073,39929,212331,1345.95,12,13,0,0,0
new-model-day,opus48,module-explainer,True,100.0,218.593,633137,618129,15008,72109,546020,2896.42,12,21,0,0,0
new-model-day,opus48,numeric-data,True,100.0,109.048,277984,271070,6914,27724,243346,2549.19,14,16,0,0,0
new-model-day,opus?task_budget=50000,benchmark-comparison,True,100.0,76.846,111775,105163,6612,20498,84665,1454.53,7,7,0,0,0
new-model-day,opus?task_budget=50000,code-review,True,100.0,63.323,109587,104544,5043,56128,48416,1730.6,4,5,0,0,0
new-model-day,opus?task_budget=50000,implementation-plan,True,100.0,62.202,111821,106572,5249,22221,84351,1797.71,7,7,0,0,0
new-model-day,opus?task_budget=50000,module-explainer,False,35.0,56.079,87378,82544,4834,68845,13699,1558.12,3,3,4,1,1
new-model-day,opus?task_budget=50000,numeric-data,True,100.0,66.763,95446,90085,5361,16591,73494,1429.62,7,7,0,0,0
new-model-day,opus?task_budget=200000,benchmark-comparison,True,100.0,281.111,1036764,1012407,24357,100128,912279,3688.09,22,28,0,0,0
new-model-day,opus?task_budget=200000,code-review,True,87,176.741,425417,411266,14151,58001,353265,2407.01,11,13,4,0,2
new-model-day,opus?task_budget=200000,implementation-plan,True,100.0,132.769,343763,332156,11607,42016,290140,2589.18,16,17,0,0,0
new-model-day,opus?task_budget=200000,module-explainer,True,100.0,460.502,1534617,1500017,34600,84706,1415311,3332.49,23,30,0,0,0
new-model-day,opus?task_budget=200000,numeric-data,True,100.0,138.509,340404,328931,11473,26642,302289,2457.63,16,17,0,0,0

Xet Storage Details

Size:
3.06 kB
·
Xet hash:
aef573dae8548d84ba5a113a32337933366784bc4ef08052a93e5863a3b318e6

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.