TraceLift / README.md
nielsr's picture
nielsr HF Staff
Improve model card: add metadata, paper information and usage snippet
ac5ccba verified
|
raw
history blame
2.05 kB
metadata
pipeline_tag: text-classification

Released TraceLift Reason RMs

This directory contains two ready-to-load full Reward Model checkpoints introduced in the paper Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards:

  • code-rm-full-ce: code-domain Reason RM.
  • math-rm-full-ce: math-domain Reason RM.

TraceLift is a planner-executor training framework that treats reasoning as a consumable intermediate artifact, using executor-grounded rewards to shape reasoning traces.

Training details

Both checkpoints were initialized from Qwen/Qwen2.5-7B-Instruct, trained with LoRA, and then merged into full Qwen2ForReasonRewardModel weights.

  • LoRA rank 32, alpha 64, dropout 0.05.
  • Five rubric classification heads with CE dimension loss.
  • One total-score head with Huber loss on the normalized total score.
  • The released checkpoints already include the backbone, rubric heads, and total head.

Usage

To use these models, you need the custom reasonrm package from the official repository.

import torch
from transformers import AutoTokenizer

from reasonrm.modeling_reward import Qwen2ForReasonRewardModel

model = Qwen2ForReasonRewardModel.from_pretrained(
    "ScottHan/TraceLift", # or path to local subdir like math-rm-full-ce
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("ScottHan/TraceLift")

Citation

@misc{han2026correctisnotenough,
  title={Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards},
  author={Han, Tianyang and Shi, Hengyu and Hu, Junjie and Yang, Xu and Wang, Zhiling and Su, Junhao},
  year={2026},
  eprint={2605.03862},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2605.03862}
}