| --- |
| pipeline_tag: text-classification |
| --- |
| |
| # Released TraceLift Reason RMs |
|
|
| This directory contains two ready-to-load full Reward Model checkpoints introduced in the paper [Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards](https://huggingface.co/papers/2605.03862): |
|
|
| - `code-rm-full-ce`: code-domain Reason RM. |
| - `math-rm-full-ce`: math-domain Reason RM. |
|
|
| TraceLift is a planner-executor training framework that treats reasoning as a consumable intermediate artifact, using executor-grounded rewards to shape reasoning traces. |
|
|
| - **Code:** [GitHub Repository](https://github.com/MasaiahHan/TraceLift) |
| - **Paper:** [arXiv:2605.03862](https://huggingface.co/papers/2605.03862) |
|
|
| ## Training details |
|
|
| Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights. |
|
|
| - LoRA rank `32`, alpha `64`, dropout `0.05`. |
| - Five rubric classification heads with CE dimension loss. |
| - One total-score head with Huber loss on the normalized total score. |
| - The released checkpoints already include the backbone, rubric heads, and total head. |
|
|
| ## Usage |
|
|
| To use these models, you need the custom `reasonrm` package from the [official repository](https://github.com/MasaiahHan/TraceLift). |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer |
| |
| from reasonrm.modeling_reward import Qwen2ForReasonRewardModel |
| |
| model = Qwen2ForReasonRewardModel.from_pretrained( |
| "ScottHan/TraceLift", # or path to local subdir like math-rm-full-ce |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("ScottHan/TraceLift") |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{han2026correctisnotenough, |
| title={Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards}, |
| author={Han, Tianyang and Shi, Hengyu and Hu, Junjie and Yang, Xu and Wang, Zhiling and Su, Junhao}, |
| year={2026}, |
| eprint={2605.03862}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.AI}, |
| url={https://arxiv.org/abs/2605.03862} |
| } |
| ``` |