--- pipeline_tag: text-classification --- # Released TraceLift Reason RMs This directory contains two ready-to-load full Reward Model checkpoints introduced in the paper [Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards](https://huggingface.co/papers/2605.03862): - `code-rm-full-ce`: code-domain Reason RM. - `math-rm-full-ce`: math-domain Reason RM. TraceLift is a planner-executor training framework that treats reasoning as a consumable intermediate artifact, using executor-grounded rewards to shape reasoning traces. - **Code:** [GitHub Repository](https://github.com/MasaiahHan/TraceLift) - **Paper:** [arXiv:2605.03862](https://huggingface.co/papers/2605.03862) ## Training details Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights. - LoRA rank `32`, alpha `64`, dropout `0.05`. - Five rubric classification heads with CE dimension loss. - One total-score head with Huber loss on the normalized total score. - The released checkpoints already include the backbone, rubric heads, and total head. ## Usage To use these models, you need the custom `reasonrm` package from the [official repository](https://github.com/MasaiahHan/TraceLift). ```python import torch from transformers import AutoTokenizer from reasonrm.modeling_reward import Qwen2ForReasonRewardModel model = Qwen2ForReasonRewardModel.from_pretrained( "ScottHan/TraceLift", # or path to local subdir like math-rm-full-ce torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("ScottHan/TraceLift") ``` ## Citation ```bibtex @misc{han2026correctisnotenough, title={Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards}, author={Han, Tianyang and Shi, Hengyu and Hu, Junjie and Yang, Xu and Wang, Zhiling and Su, Junhao}, year={2026}, eprint={2605.03862}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2605.03862} } ```