ScottHan's picture
Upload folder using huggingface_hub
c1c9060 verified

TraceLift Code Reason RM Full Checkpoint

This is the released code-domain TraceLift Reason RM.

  • Base initialization: Qwen/Qwen2.5-7B-Instruct
  • Training: LoRA, merged into full weights
  • LoRA rank: 32
  • LoRA alpha: 64
  • LoRA dropout: 0.05
  • Rubric heads: five 5-way classification heads
  • Total head: one scalar regression head
  • Dimension loss: cross entropy
  • Total loss: Huber loss on the normalized total score

The directory can be loaded directly with reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...).