XanderJC commited on
Commit
24cbe07
·
1 Parent(s): 5a1a8e5
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -139,6 +139,26 @@ response = client.chat.completions.create(
139
 
140
  Proxy Lite scored 72.4% on the [WebVoyager](https://huggingface.co/datasets/convergence-ai/WebVoyager2025Valid) benchmark, placing it 1st out of all available open-weights models.
141
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
 
143
  ### Out-of-Scope Use
144
 
 
139
 
140
  Proxy Lite scored 72.4% on the [WebVoyager](https://huggingface.co/datasets/convergence-ai/WebVoyager2025Valid) benchmark, placing it 1st out of all available open-weights models.
141
 
142
+ A breakdown of the results by website is shown below:
143
+
144
+ | web_name | Success Rate (%) | Finish Rate (%) | Avg. Steps |
145
+ |---------------------|-----------------|-----------------|------------|
146
+ | Allrecipes | 87.8 | 95.1 | 10.3 |
147
+ | Amazon | 70.0 | 90.0 | 7.1 |
148
+ | Apple | 82.1 | 89.7 | 10.7 |
149
+ | ArXiv | 60.5 | 79.1 | 16.0 |
150
+ | BBC News | 69.4 | 77.8 | 15.9 |
151
+ | Booking | 70.0 | 85.0 | 24.8 |
152
+ | Cambridge Dict. | 86.0 | 97.7 | 5.7 |
153
+ | Coursera | 82.5 | 97.5 | 4.7 |
154
+ | ESPN | 53.8 | 87.2 | 14.9 |
155
+ | GitHub | 85.0 | 92.5 | 10.0 |
156
+ | Google Flights | 38.5 | 51.3 | 34.8 |
157
+ | Google Map | 78.9 | 94.7 | 9.6 |
158
+ | Google Search | 71.4 | 92.9 | 6.0 |
159
+ | Huggingface | 68.6 | 74.3 | 18.4 |
160
+ | Wolfram Alpha | 78.3 | 93.5 | 6.1 |
161
+
162
 
163
  ### Out-of-Scope Use
164