| # TraceLift Code Reason RM Full Checkpoint | |
| This is the released code-domain TraceLift Reason RM. | |
| - Base initialization: `Qwen/Qwen2.5-7B-Instruct` | |
| - Training: LoRA, merged into full weights | |
| - LoRA rank: `32` | |
| - LoRA alpha: `64` | |
| - LoRA dropout: `0.05` | |
| - Rubric heads: five 5-way classification heads | |
| - Total head: one scalar regression head | |
| - Dimension loss: cross entropy | |
| - Total loss: Huber loss on the normalized total score | |
| The directory can be loaded directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`. | |