gabrielbo
/

spark-model-QLoRA

Text Generation

reinforcement-learning

Model card Files Files and versions

spark-model-QLoRA / hyperparams.txt

gabrielbo's picture

Add PPO trained model (actor, critic, tokenizer, hyperparams) and models.py

2a347f6 7 months ago

history blame contribute delete

159 Bytes

	lr: 5e-06
	critic_lr: 5e-06
	gamma: 0.99
	gae_lambda: 0.95
	clip_ratio: 0.2
	kl_coef: 0.1
	target_kl: 0.2
	max_grad_norm: 0.5
	value_loss_coef: 0.1
	entropy_coef: 0.01