File size: 547 Bytes
c1c9060
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# TraceLift Code Reason RM Full Checkpoint

This is the released code-domain TraceLift Reason RM.

- Base initialization: `Qwen/Qwen2.5-7B-Instruct`
- Training: LoRA, merged into full weights
- LoRA rank: `32`
- LoRA alpha: `64`
- LoRA dropout: `0.05`
- Rubric heads: five 5-way classification heads
- Total head: one scalar regression head
- Dimension loss: cross entropy
- Total loss: Huber loss on the normalized total score

The directory can be loaded directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`.