Anthropic/hh-rlhf
Viewer • Updated • 169k • 28.4k • 1.8k
How to use MahmoudMohamed/Reward_Model with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="MahmoudMohamed/Reward_Model") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("MahmoudMohamed/Reward_Model")
model = AutoModelForSequenceClassification.from_pretrained("MahmoudMohamed/Reward_Model")This model is a fine-tuned version of OpenAssistant/reward-model-deberta-v3-base on Anthropic/hh-rlhf dataset. It achieves the following results on the evaluation set:
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 0.6936 | 1.0 | 13400 | 0.6931 | 1.0 |
Base model
OpenAssistant/reward-model-deberta-v3-base