spark-model-QLoRA / hyperparams.txt
gabrielbo's picture
Add PPO trained model (actor, critic, tokenizer, hyperparams) and models.py
2a347f6
lr: 5e-06
critic_lr: 5e-06
gamma: 0.99
gae_lambda: 0.95
clip_ratio: 0.2
kl_coef: 0.1
target_kl: 0.2
max_grad_norm: 0.5
value_loss_coef: 0.1
entropy_coef: 0.01