Spaces:
Paused
Paused
| CUDA_VISIBLE_DEVICES=0 python ../src/train_ppo.py \ | |
| --do_train \ | |
| --dataset alpaca_gpt4_zh \ | |
| --dataset_dir ../data \ | |
| --finetuning_type lora \ | |
| --checkpoint_dir path_to_sft_checkpoint \ | |
| --reward_model path_to_rm_checkpoint \ | |
| --output_dir path_to_ppo_checkpoint \ | |
| --overwrite_cache \ | |
| --per_device_train_batch_size 2 \ | |
| --gradient_accumulation_steps 4 \ | |
| --lr_scheduler_type cosine \ | |
| --logging_steps 10 \ | |
| --save_steps 1000 \ | |
| --learning_rate 1e-5 \ | |
| --num_train_epochs 1.0 \ | |
| --fp16 | |