File size: 728 Bytes
c1c9060
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Released TraceLift Reason RMs

This directory contains two ready-to-load full Reward Model checkpoints:

- `code-rm-full-ce`: code-domain Reason RM.
- `math-rm-full-ce`: math-domain Reason RM.

Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights.

Training details:

- LoRA rank `32`, alpha `64`, dropout `0.05`.
- Five rubric classification heads with CE dimension loss.
- One total-score head with Huber loss on the normalized total score.
- The released checkpoints already include the backbone, rubric heads, and total head.

Load them directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`.