TraceLift Code Reason RM Full Checkpoint
This is the released code-domain TraceLift Reason RM.
- Base initialization:
Qwen/Qwen2.5-7B-Instruct - Training: LoRA, merged into full weights
- LoRA rank:
32 - LoRA alpha:
64 - LoRA dropout:
0.05 - Rubric heads: five 5-way classification heads
- Total head: one scalar regression head
- Dimension loss: cross entropy
- Total loss: Huber loss on the normalized total score
The directory can be loaded directly with reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...).