# Released TraceLift Reason RMs This directory contains two ready-to-load full Reward Model checkpoints: - `code-rm-full-ce`: code-domain Reason RM. - `math-rm-full-ce`: math-domain Reason RM. Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights. Training details: - LoRA rank `32`, alpha `64`, dropout `0.05`. - Five rubric classification heads with CE dimension loss. - One total-score head with Huber loss on the normalized total score. - The released checkpoints already include the backbone, rubric heads, and total head. Load them directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`.