wizardII commited on
Commit
36e916a
·
verified ·
1 Parent(s): 8fa0dcb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -71
README.md CHANGED
@@ -55,77 +55,6 @@ We conduct evaluation on both mathematical and coding benchmarks. Due to the hig
55
 
56
  </div>
57
 
58
- <!-- Note:
59
- 1. Evaluation variance for the same model is typically within ±0.5 across multiple runs.
60
- 2. DeepCoder consistently scored around 23 in our tests - lower than its reported performance.
61
- 3. NVIDIA's Nemotron-Research-Reasoning-Qwen-1.5B slightly outperformed its reported score, potentially due to different parameter settings in their original evaluation. -->
62
-
63
- ## Getting Started
64
-
65
- ### Installation
66
-
67
- ```bash
68
- # Installing Python 3.10 Environment.
69
- conda create -n archer python=3.10 -y
70
- conda activate archer
71
-
72
- # Installing dependencies.
73
- pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124
74
- wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
75
- pip install --no-cache-dir flash_attn-2.7.3+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
76
-
77
- cd ArcherCodeR
78
- pip install -e .
79
- ```
80
-
81
- ### Data Preparation
82
-
83
- Download the training and test data from Hugging Face.
84
-
85
- ```bash
86
- python tools/download_datasets.py
87
- ```
88
-
89
- #### Initialize Ray Cluster
90
-
91
- We have provided a one-click script to initialize Ray environments on any number of machines. Run the following command on the head node:
92
-
93
- ```bash
94
- bash ./tools/start_ray.sh
95
- ```
96
-
97
- Note:
98
- - Please replace your_wandb_api_key in export WANDB_API_KEY=your_wandb_api_key with your actual key.
99
- - Hostfile locations vary across operating systems (e.g., on my machine, it's located at /etc/mpi/hostfile). Locate the file on your server and modify its content accordingly.
100
-
101
- ### Training
102
-
103
- We have currently only provided the script and data to reproduce the results of the “ArcherCodeR-1.5B-DAPO”.
104
-
105
- ```bash
106
- bash ./scripts/train/run_archer_qwen2.5_1.5b_code.sh
107
- ```
108
-
109
- ### Evaluation
110
-
111
- #### Step 1: Convert model format
112
-
113
- Run the following command to convert the model to Hugging Face format:
114
-
115
- ```bash
116
- bash ./tools/model_merge.sh
117
- ```
118
-
119
- #### Step 2: Run evaluation
120
-
121
- Execute the script below to evaluate model performance on the LiveCodeBench v5 benchmark:
122
-
123
- ```bash
124
- bash ./scripts/eval/run_eval.sh
125
- ```
126
-
127
- Note: Please update the path parameters in the scripts above as needed.
128
-
129
  ## Technical Report
130
 
131
  [Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR](https://arxiv.org/abs/2507.15778)
 
55
 
56
  </div>
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ## Technical Report
59
 
60
  [Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR](https://arxiv.org/abs/2507.15778)