Text Classification
Transformers
Safetensors
qwen2
text-generation
text-embeddings-inference
sarosavo commited on
Commit
44c48a5
·
verified ·
1 Parent(s): cb6b105

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -111,7 +111,7 @@ bash reward_server/launch_reward.sh {MODEL_PATH} {ANSWER_PATH} {METRIC}
111
  ### Start training
112
 
113
  ```bash
114
- bash train.sh {METHOD} {PRETRAIN_PATH} {DATA_PATH} {REWARD_API}
115
 
116
  # METHOD: advantage estimator, e.g., reinforce_baseline, reinforce, rloo
117
  # PRETRAIN_PATH: path to the pretrained model, e.g., Qwen2.5-7B
 
111
  ### Start training
112
 
113
  ```bash
114
+ bash reward_server/RLVR_train.sh {METHOD} {PRETRAIN_PATH} {DATA_PATH} {REWARD_API}
115
 
116
  # METHOD: advantage estimator, e.g., reinforce_baseline, reinforce, rloo
117
  # PRETRAIN_PATH: path to the pretrained model, e.g., Qwen2.5-7B