evalstate/birch-html / analysis /deep-dives /opus-performance-deep-dive.md
evalstate's picture
|
download
raw
6.57 kB

Opus token and wall-time performance deep dive

Date: 2026-05-30

Scope: every report row whose model name contains opus: opus47 from the publish suite, plus opus46, opus48, and the two Opus 4.8 task-budget experiments from new-model-day.

Headline

  • Fastest wall time: opus?task_budget=50000 at 325.2s; note it only generated 4/5 successfully.
  • Lowest total tokens: opus?task_budget=50000 at 516,007 tokens.
  • Best 100-quality Opus row by quality-efficiency: opus47 (872.9s; 2,041,367 tokens).
  • task_budget=50000 cut wall time by 66.8% and total tokens by 78.1% vs opus48, but quality fell from 100.0 to 87.0 and one artifact failed generation.
  • task_budget=200000 was slower (+21.5%) and used more tokens (+56.4%) than plain opus48, with lower quality (97.4 vs 100.0).

Overall Opus rows

model suite quality gen wall time total tokens input output effective input cache % tok/s out tok/s turns tools det VLM QE rank
opus47 publish 100.0 5/5 872.9s 2,041,367 1,980,822 60,545 228,388 88.5% 2338.7 69.4 67 83 0 0F/0W 2
opus46 new-model-day 98.8 5/5 997.5s 1,851,922 1,793,023 58,899 191,646 89.3% 1856.5 59.0 73 98 2 0F/0W 7
opus48 new-model-day 100.0 5/5 979.4s 2,354,120 2,286,911 67,209 267,975 88.3% 2403.7 68.6 71 91 0 0F/0W 6
opus?task_budget=50000 new-model-day 87.0 4/5 325.2s 516,007 488,908 27,099 184,283 62.3% 1586.7 83.3 28 29 4 1F/1W 8
opus?task_budget=200000 new-model-day 97.4 5/5 1,189.6s 3,680,965 3,584,777 96,188 311,493 91.3% 3094.2 80.9 88 105 4 0F/2W 10

Relative to plain opus48

model wall-time ratio token ratio output-token ratio quality delta notes
opus47 0.89× 0.87× 0.90× +0.0 clean
opus46 1.02× 0.79× 0.88× -1.2 2 det failures
opus48 1.00× 1.00× 1.00× +0.0 clean
opus?task_budget=50000 0.33× 0.22× 0.40× -13.0 4/5 gen, 4 det failures, 1F/1W VLM
opus?task_budget=200000 1.21× 1.56× 1.43× -2.6 4 det failures, 0F/2W VLM

Per-artifact wall time and tokens

benchmark-comparison

model ok quality wall time total tokens input output effective input tok/s turns tools det VLM
opus47 True 100.0 150.0s 397,948 388,331 9,617 26,486 2652.2 19 22 0 0F/0W
opus46 True 100.0 272.0s 371,021 351,900 19,121 56,694 1364.3 14 18 0 0F/0W
opus48 True 100.0 258.3s 704,433 685,790 18,643 55,911 2727.1 21 26 0 0F/0W
opus?task_budget=50000 True 100.0 76.8s 111,775 105,163 6,612 20,498 1454.5 7 7 0 0F/0W
opus?task_budget=200000 True 100.0 281.1s 1,036,764 1,012,407 24,357 100,128 3688.1 22 28 0 0F/0W

code-review

model ok quality wall time total tokens input output effective input tok/s turns tools det VLM
opus47 True 100.0 268.4s 588,373 571,314 17,059 73,388 2192.5 14 18 0 0F/0W
opus46 True 100.0 237.0s 540,085 528,342 11,743 40,896 2278.4 17 29 0 0F/0W
opus48 True 100.0 197.0s 474,233 459,662 14,571 72,302 2406.7 12 15 0 0F/0W
opus?task_budget=50000 True 100.0 63.3s 109,587 104,544 5,043 56,128 1730.6 4 5 0 0F/0W
opus?task_budget=200000 True 87.0 176.7s 425,417 411,266 14,151 58,001 2407.0 11 13 4 0F/2W

implementation-plan

model ok quality wall time total tokens input output effective input tok/s turns tools det VLM
opus47 True 100.0 141.6s 215,600 206,186 9,414 22,107 1522.3 11 12 0 0F/0W
opus46 True 94.0 130.3s 167,161 159,833 7,328 22,835 1283.2 11 12 2 0F/0W
opus48 True 100.0 196.4s 264,333 252,260 12,073 39,929 1345.9 12 13 0 0F/0W
opus?task_budget=50000 True 100.0 62.2s 111,821 106,572 5,249 22,221 1797.7 7 7 0 0F/0W
opus?task_budget=200000 True 100.0 132.8s 343,763 332,156 11,607 42,016 2589.2 16 17 0 0F/0W

module-explainer

model ok quality wall time total tokens input output effective input tok/s turns tools det VLM
opus47 True 100.0 206.7s 669,243 653,611 15,632 85,438 3237.0 13 19 0 0F/0W
opus46 True 100.0 192.8s 417,791 406,724 11,067 44,687 2167.1 11 18 0 0F/0W
opus48 True 100.0 218.6s 633,137 618,129 15,008 72,109 2896.4 12 21 0 0F/0W
opus?task_budget=50000 False 35.0 56.1s 87,378 82,544 4,834 68,845 1558.1 3 3 4 1F/1W
opus?task_budget=200000 True 100.0 460.5s 1,534,617 1,500,017 34,600 84,706 3332.5 23 30 0 0F/0W

numeric-data

model ok quality wall time total tokens input output effective input tok/s turns tools det VLM
opus47 True 100.0 106.1s 170,203 161,380 8,823 20,969 1604.4 10 12 0 0F/0W
opus46 True 100.0 165.4s 355,864 346,224 9,640 26,534 2150.9 20 21 0 0F/0W
opus48 True 100.0 109.0s 277,984 271,070 6,914 27,724 2549.2 14 16 0 0F/0W
opus?task_budget=50000 True 100.0 66.8s 95,446 90,085 5,361 16,591 1429.6 7 7 0 0F/0W
opus?task_budget=200000 True 100.0 138.5s 340,404 328,931 11,473 26,642 2457.6 16 17 0 0F/0W

Files

  • CSV summary: analysis/deep-dives/opus-performance-summary.csv
  • CSV by artifact: analysis/deep-dives/opus-performance-by-artifact.csv
  • Source metrics: analysis/data/model-summary.json, analysis/data/artifact-summary.json

Xet Storage Details

Size:
6.57 kB
·
Xet hash:
5e161a9dac66e7f4bfef4e4aa9f62d40455ecaf32b99eef14793a43da3232859

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.