ScottHan's picture
Upload folder using huggingface_hub
c1c9060 verified
# TraceLift Code Reason RM Full Checkpoint
This is the released code-domain TraceLift Reason RM.
- Base initialization: `Qwen/Qwen2.5-7B-Instruct`
- Training: LoRA, merged into full weights
- LoRA rank: `32`
- LoRA alpha: `64`
- LoRA dropout: `0.05`
- Rubric heads: five 5-way classification heads
- Total head: one scalar regression head
- Dimension loss: cross entropy
- Total loss: Huber loss on the normalized total score
The directory can be loaded directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`.