Kabs9000 commited on
Commit
2236393
·
verified ·
1 Parent(s): 8285ae4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -26,7 +26,7 @@ This model is a **ReMax (Reinforcement Learning with Maximization)** fine-tune o
26
  It was aligned using the **[HelpSteer2](https://huggingface.co/datasets/Jennny/helpsteer2-helpfulness-preference)** dataset to improve helpfulness and instruction following capabilities. Unlike standard PPO, ReMax eliminates the need for a value model (Critic) and uses a greedy baseline to reduce variance, making it highly efficient for alignment.
27
  The goal of the fine-tuning was to improve helpfulness/harmlessness behavior as measured by the HelpSteer2 dataset, while also enabling controlled model diffing experiments as part of the AIPlans research workflow.
28
  Developed by: AIPlans
29
-
30
  Funded by : AIPlans
31
 
32
  Shared by: AIPlans
 
26
  It was aligned using the **[HelpSteer2](https://huggingface.co/datasets/Jennny/helpsteer2-helpfulness-preference)** dataset to improve helpfulness and instruction following capabilities. Unlike standard PPO, ReMax eliminates the need for a value model (Critic) and uses a greedy baseline to reduce variance, making it highly efficient for alignment.
27
  The goal of the fine-tuning was to improve helpfulness/harmlessness behavior as measured by the HelpSteer2 dataset, while also enabling controlled model diffing experiments as part of the AIPlans research workflow.
28
  Developed by: AIPlans
29
+ LoRA was not used in this finetuning.
30
  Funded by : AIPlans
31
 
32
  Shared by: AIPlans