Instructions to use Ray2333/gpt2-large-harmless-reward_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ray2333/gpt2-large-harmless-reward_model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Ray2333/gpt2-large-harmless-reward_model")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Ray2333/gpt2-large-harmless-reward_model") model = AutoModelForSequenceClassification.from_pretrained("Ray2333/gpt2-large-harmless-reward_model") - Notebooks
- Google Colab
- Kaggle
How to train the model
#1
by mike2000 - opened
Hello, may I ask how you train the model? I have tried to use deepspeed-chat to train gpt2-large as reward model, but the acc is about 67%.
May you share the detail about the model structure and data format?
Hello, may I ask how you train the model? I have tried to use deepspeed-chat to train gpt2-large as reward model, but the acc is about 67%.
May you share the detail about the model structure and data format?
Hello, based on my experience, there are two suggestions you can check: training reward models for harmless-based and helpful-based independently, and using LORA instead of full training.