TraceLift / README.md
ScottHan's picture
Upload folder using huggingface_hub
c1c9060 verified

Released TraceLift Reason RMs

This directory contains two ready-to-load full Reward Model checkpoints:

  • code-rm-full-ce: code-domain Reason RM.
  • math-rm-full-ce: math-domain Reason RM.

Both checkpoints were initialized from Qwen/Qwen2.5-7B-Instruct, trained with LoRA, and then merged into full Qwen2ForReasonRewardModel weights.

Training details:

  • LoRA rank 32, alpha 64, dropout 0.05.
  • Five rubric classification heads with CE dimension loss.
  • One total-score head with Huber loss on the normalized total score.
  • The released checkpoints already include the backbone, rubric heads, and total head.

Load them directly with reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...).