Text Classification
Transformers
Safetensors
English
llama
reward-model
dir
preference-model
text-embeddings-inference
Instructions to use SilverStRock/DIR-Llama3.1-8B-RM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SilverStRock/DIR-Llama3.1-8B-RM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="SilverStRock/DIR-Llama3.1-8B-RM")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("SilverStRock/DIR-Llama3.1-8B-RM") model = AutoModelForSequenceClassification.from_pretrained("SilverStRock/DIR-Llama3.1-8B-RM") - Notebooks
- Google Colab
- Kaggle
DIR Llama-3.1-8B Reward Model
This repository contains the debiased reward model checkpoint used in the DIR reproduction workspace.
- Base model:
Meta-Llama-3.1-8B-Instruct - Training data:
Skywork-Reward-Preference-80K-v0.2 - DIR checkpoint:
checkpoint-601 - RM-Bench during training:
eval_RMBench_total ~= 0.68493 - Original server path:
/data03/shibingkang/DIR/reward_models/my_outputs/Meta-Llama-3.1-8B-Instruct_DB_Difference-1_from-SK-v0.2_Debias-Tasklength_difference-1.0_len4096_fulltrain_2e-06_dataSkywork-Reward-Preference-80K-v0.2/checkpoint-601
The upload intentionally excludes DeepSpeed optimizer/model-state recovery files under global_step601/.
Those files are very large and are not needed for loading the reward model for evaluation or PPO reward scoring.
Associated code and reproduction notes: https://github.com/BingkangShi/DIR-reproduction
- Downloads last month
- 19
Model tree for SilverStRock/DIR-Llama3.1-8B-RM
Base model
meta-llama/Llama-3.1-8B Finetuned
meta-llama/Llama-3.1-8B-Instruct