princeton-nlp/mistral-instruct-ultrafeedback
Viewer • Updated • 62.7k • 852 • 4
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the princeton-nlp/mistral-instruct-ultrafeedback dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|---|---|---|---|---|---|---|---|---|---|---|---|
| -0.0905 | 0.8573 | 400 | -2.9000 | -2.8928 | -1.1346 | -1.2466 | -0.0708 | 0.5758 | -1.1346 | 0.1120 | -1.2466 |
Base model
mistralai/Mistral-7B-v0.1