hengm3467 commited on
Commit
1678751
·
1 Parent(s): 47ec689

add benchmark chart above Pricing section

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +2 -0
  3. assets/benchmarks.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -30,6 +30,8 @@ Execution reliability is critical for autonomous agents. Step 3.7 Flash leads th
30
 
31
  Step 3.7 Flash is built for live engineering tasks and secured a definitive second-place finish on SWE-Bench PRO with a score of 56.3. It can independently trace multi-file repositories, isolate bugs from raw issue reports, and generate functional patches that pass automated unit tests. While evaluations like Terminal-Bench 2.1 (59.5) and GPDVal (45.8) show clear areas for future optimization compared to the absolute peak of the cohort, they establish a dependable baseline for system interactions and structured professional deliverables.
32
 
 
 
33
  ## 3. Pricing
34
 
35
  | Token Type | Price |
 
30
 
31
  Step 3.7 Flash is built for live engineering tasks and secured a definitive second-place finish on SWE-Bench PRO with a score of 56.3. It can independently trace multi-file repositories, isolate bugs from raw issue reports, and generate functional patches that pass automated unit tests. While evaluations like Terminal-Bench 2.1 (59.5) and GPDVal (45.8) show clear areas for future optimization compared to the absolute peak of the cohort, they establish a dependable baseline for system interactions and structured professional deliverables.
32
 
33
+ ![Step 3.7 Flash benchmark results across General Agent, Agentic Coding, and Multimodal evaluations](assets/benchmarks.png)
34
+
35
  ## 3. Pricing
36
 
37
  | Token Type | Price |
assets/benchmarks.png ADDED

Git LFS Details

  • SHA256: 0dcaeef0d844ef924ee880b18914c47a9cff30a2a38146c9df3fca6aa3345448
  • Pointer size: 131 Bytes
  • Size of remote file: 626 kB