GRATH: Gradual Self-Truthifying for Large Language Models
Paper
• 2401.12292 • Published
• 2
This is a gradually self-truthified model (with one iteration) proposed in the paper GRATH: Gradual Self-Truthifying for Large Language Models.
Note: This model is applied with DPO twice. The reference model of DPO is set as the current base model.
The following bitsandbytes quantization config was used during training:
The following bitsandbytes quantization config was used during training:
PEFT 0.5.0
PEFT 0.5.0