DIR Llama-3.1-8B Reward Model

This repository contains the debiased reward model checkpoint used in the DIR reproduction workspace.

Base model: Meta-Llama-3.1-8B-Instruct
Training data: Skywork-Reward-Preference-80K-v0.2
DIR checkpoint: checkpoint-601
RM-Bench during training: eval_RMBench_total ~= 0.68493
Original server path: /data03/shibingkang/DIR/reward_models/my_outputs/Meta-Llama-3.1-8B-Instruct_DB_Difference-1_from-SK-v0.2_Debias-Tasklength_difference-1.0_len4096_fulltrain_2e-06_dataSkywork-Reward-Preference-80K-v0.2/checkpoint-601

The upload intentionally excludes DeepSpeed optimizer/model-state recovery files under global_step601/. Those files are very large and are not needed for loading the reward model for evaluation or PPO reward scoring.

Associated code and reproduction notes: https://github.com/BingkangShi/DIR-reproduction

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for SilverStRock/DIR-Llama3.1-8B-RM

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2904)

this model