File size: 547 Bytes
c1c9060 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # TraceLift Code Reason RM Full Checkpoint
This is the released code-domain TraceLift Reason RM.
- Base initialization: `Qwen/Qwen2.5-7B-Instruct`
- Training: LoRA, merged into full weights
- LoRA rank: `32`
- LoRA alpha: `64`
- LoRA dropout: `0.05`
- Rubric heads: five 5-way classification heads
- Total head: one scalar regression head
- Dimension loss: cross entropy
- Total loss: Huber loss on the normalized total score
The directory can be loaded directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`.
|