| # Released TraceLift Reason RMs | |
| This directory contains two ready-to-load full Reward Model checkpoints: | |
| - `code-rm-full-ce`: code-domain Reason RM. | |
| - `math-rm-full-ce`: math-domain Reason RM. | |
| Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights. | |
| Training details: | |
| - LoRA rank `32`, alpha `64`, dropout `0.05`. | |
| - Five rubric classification heads with CE dimension loss. | |
| - One total-score head with Huber loss on the normalized total score. | |
| - The released checkpoints already include the backbone, rubric heads, and total head. | |
| Load them directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`. | |