Update README.md
Browse files
README.md
CHANGED
|
@@ -111,7 +111,7 @@ bash reward_server/launch_reward.sh {MODEL_PATH} {ANSWER_PATH} {METRIC}
|
|
| 111 |
### Start training
|
| 112 |
|
| 113 |
```bash
|
| 114 |
-
bash
|
| 115 |
|
| 116 |
# METHOD: advantage estimator, e.g., reinforce_baseline, reinforce, rloo
|
| 117 |
# PRETRAIN_PATH: path to the pretrained model, e.g., Qwen2.5-7B
|
|
|
|
| 111 |
### Start training
|
| 112 |
|
| 113 |
```bash
|
| 114 |
+
bash reward_server/RLVR_train.sh {METHOD} {PRETRAIN_PATH} {DATA_PATH} {REWARD_API}
|
| 115 |
|
| 116 |
# METHOD: advantage estimator, e.g., reinforce_baseline, reinforce, rloo
|
| 117 |
# PRETRAIN_PATH: path to the pretrained model, e.g., Qwen2.5-7B
|