princeton-nlp/gemma2-ultrafeedback-armorm
Viewer • Updated • 61.5k • 79 • 45
How to use jz666/simpo with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("jz666/simpo", dtype="auto")This model is a fine-tuned version of google/gemma-2-9b-it on the princeton-nlp/gemma2-ultrafeedback-armorm dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.7196 | 0.8594 | 400 | 2.7580 | -18.9526 | -23.9387 | 0.7705 | 4.9861 | -2.3939 | -1.8953 | -14.4321 | -14.5024 |