Update README.md
Browse files
README.md
CHANGED
|
@@ -153,6 +153,31 @@ foundation for next-generation language model agents to reason and tackle real-w
|
|
| 153 |
|
| 154 |
\* conducted on the text-only HLE subset.
|
| 155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
## 3. Deployment Guide
|
| 157 |
|
| 158 |
Download the model from HuggingFace repository:
|
|
|
|
| 153 |
|
| 154 |
\* conducted on the text-only HLE subset.
|
| 155 |
|
| 156 |
+
### SWE-bench methodology
|
| 157 |
+
We report results derived from the Agentless scaffold. Departing from the original pipeline, our methodology employs a two-stage localization process (without any embedding-based retrieval mechanisms): initial coarse-grained file localization followed by fine-grained localization to specific files and code elements. The values for our models are calculated on the subset of n=486 verified tasks which work on our infrastructure. The excluded 14 test cases that were incompatible with our internal infrastructure are:
|
| 158 |
+
"astropy__astropy-7606",
|
| 159 |
+
"astropy__astropy-8707",
|
| 160 |
+
"astropy__astropy-8872",
|
| 161 |
+
"django__django-10097",
|
| 162 |
+
"matplotlib__matplotlib-20488",
|
| 163 |
+
"psf__requests-2317",
|
| 164 |
+
"psf__requests-2931",
|
| 165 |
+
"psf__requests-5414",
|
| 166 |
+
"pylint-dev__pylint-6528",
|
| 167 |
+
"pylint-dev__pylint-7277",
|
| 168 |
+
"sphinx-doc__sphinx-10435",
|
| 169 |
+
"sphinx-doc__sphinx-7985",
|
| 170 |
+
"sphinx-doc__sphinx-8269",
|
| 171 |
+
"sphinx-doc__sphinx-8475"
|
| 172 |
+
|
| 173 |
+
### TAU-bench methodology
|
| 174 |
+
We evaluate TAU-Bench with the average passrate of 5 samples for each query, with GPT-4.1 as user model and without any custom tools. The maximum number of interaction steps is 30.
|
| 175 |
+
We prepend a general principle to the policy prompt.
|
| 176 |
+
#### General
|
| 177 |
+
- In each round, you need to carefully examine the tools provided to you to determine if any can be used.
|
| 178 |
+
- You must adhere to all of the policies. Pay attention to the details in the terms. Solutions for most situations can be found within these policies.
|
| 179 |
+
|
| 180 |
+
|
| 181 |
## 3. Deployment Guide
|
| 182 |
|
| 183 |
Download the model from HuggingFace repository:
|