TraceLift / README.md

nielsr HF Staff

Improve model card: add metadata, paper information and usage snippet

ac5ccba verified 13 days ago

2.05 kB

pipeline_tag: text-classification

Released TraceLift Reason RMs

This directory contains two ready-to-load full Reward Model checkpoints introduced in the paper Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards:

code-rm-full-ce: code-domain Reason RM.
math-rm-full-ce: math-domain Reason RM.

TraceLift is a planner-executor training framework that treats reasoning as a consumable intermediate artifact, using executor-grounded rewards to shape reasoning traces.

Code: GitHub Repository
Paper: arXiv:2605.03862

Training details

Both checkpoints were initialized from Qwen/Qwen2.5-7B-Instruct, trained with LoRA, and then merged into full Qwen2ForReasonRewardModel weights.

LoRA rank 32, alpha 64, dropout 0.05.
Five rubric classification heads with CE dimension loss.
One total-score head with Huber loss on the normalized total score.
The released checkpoints already include the backbone, rubric heads, and total head.

Usage

To use these models, you need the custom reasonrm package from the official repository.

import torch
from transformers import AutoTokenizer

from reasonrm.modeling_reward import Qwen2ForReasonRewardModel

model = Qwen2ForReasonRewardModel.from_pretrained(
    "ScottHan/TraceLift", # or path to local subdir like math-rm-full-ce
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("ScottHan/TraceLift")

Citation

@misc{han2026correctisnotenough,
  title={Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards},
  author={Han, Tianyang and Shi, Hengyu and Hu, Junjie and Yang, Xu and Wang, Zhiling and Su, Junhao},
  year={2026},
  eprint={2605.03862},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2605.03862}
}