princeton-nlp/mistral-instruct-ultrafeedback
Viewer • Updated • 62.7k • 690 • 4
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the princeton-nlp/mistral-instruct-ultrafeedback dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.8319 | 0.8573 | 400 | 1.8911 | -0.9791 | -1.1183 | 0.6263 | 0.1393 | -1.1183 | -0.9791 | -2.7752 | -2.7881 |
Base model
mistralai/Mistral-7B-v0.1