princeton-nlp/llama3-ultrafeedback
Viewer • Updated • 61.8k • 750 • 18
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.2443 | 0.8550 | 400 | 1.2416 | -0.3361 | -0.4013 | 0.5915 | 0.0652 | -0.4013 | -0.3361 | 0.0031 | 0.0123 |
Base model
meta-llama/Meta-Llama-3-8B-Instruct