TraceLift / README.md
nielsr's picture
nielsr HF Staff
Improve model card: add metadata, paper information and usage snippet
ac5ccba verified
|
raw
history blame
2.05 kB
---
pipeline_tag: text-classification
---
# Released TraceLift Reason RMs
This directory contains two ready-to-load full Reward Model checkpoints introduced in the paper [Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards](https://huggingface.co/papers/2605.03862):
- `code-rm-full-ce`: code-domain Reason RM.
- `math-rm-full-ce`: math-domain Reason RM.
TraceLift is a planner-executor training framework that treats reasoning as a consumable intermediate artifact, using executor-grounded rewards to shape reasoning traces.
- **Code:** [GitHub Repository](https://github.com/MasaiahHan/TraceLift)
- **Paper:** [arXiv:2605.03862](https://huggingface.co/papers/2605.03862)
## Training details
Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights.
- LoRA rank `32`, alpha `64`, dropout `0.05`.
- Five rubric classification heads with CE dimension loss.
- One total-score head with Huber loss on the normalized total score.
- The released checkpoints already include the backbone, rubric heads, and total head.
## Usage
To use these models, you need the custom `reasonrm` package from the [official repository](https://github.com/MasaiahHan/TraceLift).
```python
import torch
from transformers import AutoTokenizer
from reasonrm.modeling_reward import Qwen2ForReasonRewardModel
model = Qwen2ForReasonRewardModel.from_pretrained(
"ScottHan/TraceLift", # or path to local subdir like math-rm-full-ce
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("ScottHan/TraceLift")
```
## Citation
```bibtex
@misc{han2026correctisnotenough,
title={Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards},
author={Han, Tianyang and Shi, Hengyu and Hu, Junjie and Yang, Xu and Wang, Zhiling and Su, Junhao},
year={2026},
eprint={2605.03862},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.03862}
}
```