Heisenburger2000 commited on
Commit
0870e48
·
verified ·
1 Parent(s): 2c1695c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -11
README.md CHANGED
@@ -4,7 +4,7 @@
4
 
5
  [![arXiv](https://img.shields.io/badge/arXiv-2602.09892-b31b1b.svg)](https://arxiv.org/abs/2602.09892)
6
  [![Hugging Face Datasets](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Datasets-blue)](https://huggingface.co/collections/AweAI-Team/scale-swe)
7
- [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/AweAI-Team/Scale-SWE)
8
  [![Website](https://img.shields.io/badge/%F0%9F%8C%90_Project-Website-blue.svg)](https://aweai-team.github.io/projects/scaleswe/)
9
  [![License](https://img.shields.io/badge/License-CC%20BY%204.0-green.svg)](LICENSE)
10
  <br>
@@ -13,6 +13,7 @@
13
  </div>
14
 
15
 
 
16
  ## 🔥 Highlights
17
 
18
  - Source from 6M+ pull requests and 23000+ repositories.
@@ -26,8 +27,11 @@
26
  - **2026-02-26** 🚀 We released a portion of our data on [Hugging Face](https://huggingface.co/collections/AweAI-Team/scale-swe). This release includes **20,000 SWE task instances**—currently the largest **Real Executable** open-source SWE dataset available—alongside **71k distillation trajectories(3.5B)** from DeepSeek v3.2. **Much more data** will be released in the future.
27
  - **2026-02-10** 📝 Our paper [**"Immersion in the GitHub Universe: Scaling Coding Agents to Mastery"**](https://arxiv.org/abs/2602.09892) is now available on arXiv.
28
 
29
- ## 📊 Data Format
30
 
 
 
 
 
31
 
32
  | Field | Description |
33
  | :--- | :--- |
@@ -41,15 +45,21 @@
41
  | **`pr_commit`** | The commit hash of the pull request. |
42
  | **`parent_commit`** | The commit hash of the parent commit (base state). |
43
  | **`problem_statement`** | The issue description conveying the bug, provided to the model as input. |
44
- | **`f2p_patch`** | The developer-written test patch containing tests that fail before the fix (if available). |
45
- | **`f2p_script`** | The synthetic reproduction script generated by our unit-test creator agent. |
46
  | **`FAIL_TO_PASS`** | Unit tests that fail on the buggy version but pass after the fix. |
47
  | **`PASS_TO_PASS`** | Unit tests that pass in both versions (regression tests). |
48
  | **`github_url`** | The URL of the original GitHub repository. |
49
- | **`pre_commands`** | These commands must be executed immediately upon entering the container to check out the correct commit. |
50
 
51
- ## 🤖 Results
52
- We fine-tuned Qwen-30B-A3B-Instruct on our synthesized trajectories.
 
 
 
 
 
 
53
 
54
 
55
  ## 📖 Citation
@@ -66,7 +76,3 @@ If you find this project useful for your research, please consider citing our pa
66
  url={https://arxiv.org/abs/2602.09892},
67
  }
68
  ```
69
-
70
- ## 📄 License
71
-
72
- This project is licensed under the CC BY 4.0 License - see the [LICENSE](LICENSE) file for details.
 
4
 
5
  [![arXiv](https://img.shields.io/badge/arXiv-2602.09892-b31b1b.svg)](https://arxiv.org/abs/2602.09892)
6
  [![Hugging Face Datasets](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Datasets-blue)](https://huggingface.co/collections/AweAI-Team/scale-swe)
7
+ [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/AweAI-Team/Scale-SWE-Agent)
8
  [![Website](https://img.shields.io/badge/%F0%9F%8C%90_Project-Website-blue.svg)](https://aweai-team.github.io/projects/scaleswe/)
9
  [![License](https://img.shields.io/badge/License-CC%20BY%204.0-green.svg)](LICENSE)
10
  <br>
 
13
  </div>
14
 
15
 
16
+
17
  ## 🔥 Highlights
18
 
19
  - Source from 6M+ pull requests and 23000+ repositories.
 
27
  - **2026-02-26** 🚀 We released a portion of our data on [Hugging Face](https://huggingface.co/collections/AweAI-Team/scale-swe). This release includes **20,000 SWE task instances**—currently the largest **Real Executable** open-source SWE dataset available—alongside **71k distillation trajectories(3.5B)** from DeepSeek v3.2. **Much more data** will be released in the future.
28
  - **2026-02-10** 📝 Our paper [**"Immersion in the GitHub Universe: Scaling Coding Agents to Mastery"**](https://arxiv.org/abs/2602.09892) is now available on arXiv.
29
 
 
30
 
31
+ ## FAQ
32
+ - For evaluation of Scale-SWE-Data, you can use AweAgent and refer to this [evaluation script](https://github.com/AweAI-Team/AweAgent/blob/main/awe_agent/tasks/beyond_swe/evaluator.py).
33
+
34
+ ## 📊 Data Format
35
 
36
  | Field | Description |
37
  | :--- | :--- |
 
45
  | **`pr_commit`** | The commit hash of the pull request. |
46
  | **`parent_commit`** | The commit hash of the parent commit (base state). |
47
  | **`problem_statement`** | The issue description conveying the bug, provided to the model as input. |
48
+ | **`f2p_patch`** | The developer-written test patch containing tests that fail before the fix (if available). For evaluation, this patch should be applied. See [this script](https://github.com/AweAI-Team/AweAgent/blob/main/awe_agent/tasks/beyond_swe/evaluator.py). |
49
+ | **`f2p_script`** | The synthetic reproduction script generated by our unit-test creator agent. Because a lot of high qaulity pull request do not have author written F2P, we can only synthetic F2P. This should be applied as test_fail_to_pass.py file just under repository directory. just before evaluation. See [this script](https://github.com/AweAI-Team/AweAgent/blob/main/awe_agent/tasks/beyond_swe/evaluator.py). |
50
  | **`FAIL_TO_PASS`** | Unit tests that fail on the buggy version but pass after the fix. |
51
  | **`PASS_TO_PASS`** | Unit tests that pass in both versions (regression tests). |
52
  | **`github_url`** | The URL of the original GitHub repository. |
53
+ | **`pre_commands`** | These commands **must** be executed immediately upon entering the container to check out the correct commit. |
54
 
55
+ ## Scale-SWE-Agent
56
+ Please use [AweAgent](https://github.com/AweAI-Team/AweAgent) to inference Scale-SWE-Agent. Scale-SWE-Agent model parameter is avaliable at [Huggingface](https://huggingface.co/AweAI-Team/Scale-SWE-Agent). Key parameters can be seen below:
57
+
58
+ | Parameter | Value |
59
+ | :--- | :--- |
60
+ | Max turns | 200 |
61
+ | Max sequence length | 256k |
62
+ | Temperature | 1 |
63
 
64
 
65
  ## 📖 Citation
 
76
  url={https://arxiv.org/abs/2602.09892},
77
  }
78
  ```