nielsr HF Staff

Improve model card: add metadata, paper information and usage snippet

ac5ccba verified 13 days ago

2.05 kB

	---
	pipeline_tag: text-classification
	---

	# Released TraceLift Reason RMs

	This directory contains two ready-to-load full Reward Model checkpoints introduced in the paper [Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards](https://huggingface.co/papers/2605.03862):

	- `code-rm-full-ce`: code-domain Reason RM.
	- `math-rm-full-ce`: math-domain Reason RM.

	TraceLift is a planner-executor training framework that treats reasoning as a consumable intermediate artifact, using executor-grounded rewards to shape reasoning traces.

	- Code: [GitHub Repository](https://github.com/MasaiahHan/TraceLift)
	- Paper: [arXiv:2605.03862](https://huggingface.co/papers/2605.03862)

	## Training details

	Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights.

	- LoRA rank `32`, alpha `64`, dropout `0.05`.
	- Five rubric classification heads with CE dimension loss.
	- One total-score head with Huber loss on the normalized total score.
	- The released checkpoints already include the backbone, rubric heads, and total head.

	## Usage

	To use these models, you need the custom `reasonrm` package from the [official repository](https://github.com/MasaiahHan/TraceLift).

	```python
	import torch
	from transformers import AutoTokenizer

	from reasonrm.modeling_reward import Qwen2ForReasonRewardModel

	model = Qwen2ForReasonRewardModel.from_pretrained(
	"ScottHan/TraceLift", # or path to local subdir like math-rm-full-ce
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	tokenizer = AutoTokenizer.from_pretrained("ScottHan/TraceLift")
	```

	## Citation

	```bibtex
	@misc{han2026correctisnotenough,
	title={Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards},
	author={Han, Tianyang and Shi, Hengyu and Hu, Junjie and Yang, Xu and Wang, Zhiling and Su, Junhao},
	year={2026},
	eprint={2605.03862},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2605.03862}
	}
	```