Text Classification
Transformers
PyTorch
English
deberta-v2
reward-model
reward_model
RLHF
text-embeddings-inference
Instructions to use OpenAssistant/reward-model-deberta-v3-large-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenAssistant/reward-model-deberta-v3-large-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="OpenAssistant/reward-model-deberta-v3-large-v2")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("OpenAssistant/reward-model-deberta-v3-large-v2") model = AutoModelForSequenceClassification.from_pretrained("OpenAssistant/reward-model-deberta-v3-large-v2") - Inference
- Notebooks
- Google Colab
- Kaggle
How to optimize loss function?
#1
by nidong - opened
According to the InstructGPT paper, the current loss function is pairwise loss, but I found that the gap between the output scores cannot be widened. Is there any direction to solve this problem?
"the output scores" you are referring to, is it this model or something you are currently facing? Cause InstructGPT did have a mean adjusting step where they make sure the average rank scores in their datasets have a zero mean