metadata
pipeline_tag: text-classification
Released TraceLift Reason RMs
This directory contains two ready-to-load full Reward Model checkpoints introduced in the paper Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards:
code-rm-full-ce: code-domain Reason RM.math-rm-full-ce: math-domain Reason RM.
TraceLift is a planner-executor training framework that treats reasoning as a consumable intermediate artifact, using executor-grounded rewards to shape reasoning traces.
- Code: GitHub Repository
- Paper: arXiv:2605.03862
Training details
Both checkpoints were initialized from Qwen/Qwen2.5-7B-Instruct, trained with LoRA, and then merged into full Qwen2ForReasonRewardModel weights.
- LoRA rank
32, alpha64, dropout0.05. - Five rubric classification heads with CE dimension loss.
- One total-score head with Huber loss on the normalized total score.
- The released checkpoints already include the backbone, rubric heads, and total head.
Usage
To use these models, you need the custom reasonrm package from the official repository.
import torch
from transformers import AutoTokenizer
from reasonrm.modeling_reward import Qwen2ForReasonRewardModel
model = Qwen2ForReasonRewardModel.from_pretrained(
"ScottHan/TraceLift", # or path to local subdir like math-rm-full-ce
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("ScottHan/TraceLift")
Citation
@misc{han2026correctisnotenough,
title={Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards},
author={Han, Tianyang and Shi, Hengyu and Hu, Junjie and Yang, Xu and Wang, Zhiling and Su, Junhao},
year={2026},
eprint={2605.03862},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.03862}
}