ScottHan
/

TraceLift

Safetensors

Model card Files Files and versions

xet

Community

Improve model card: add metadata, paper information and usage snippet

by nielsr HF Staff - opened about 16 hours ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+42

-4

Files changed (1) hide show

README.md +42 -4

README.md CHANGED Viewed

@@ -1,18 +1,56 @@
 # Released TraceLift Reason RMs
-This directory contains two ready-to-load full Reward Model checkpoints:
 - `code-rm-full-ce`: code-domain Reason RM.
 - `math-rm-full-ce`: math-domain Reason RM.
-Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights.
-Training details:
 - LoRA rank `32`, alpha `64`, dropout `0.05`.
 - Five rubric classification heads with CE dimension loss.
 - One total-score head with Huber loss on the normalized total score.
 - The released checkpoints already include the backbone, rubric heads, and total head.
-Load them directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`.

+---
+pipeline_tag: text-classification
+---
 # Released TraceLift Reason RMs
+This directory contains two ready-to-load full Reward Model checkpoints introduced in the paper [Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards](https://huggingface.co/papers/2605.03862):
 - `code-rm-full-ce`: code-domain Reason RM.
 - `math-rm-full-ce`: math-domain Reason RM.
+TraceLift is a planner-executor training framework that treats reasoning as a consumable intermediate artifact, using executor-grounded rewards to shape reasoning traces.
+- **Code:** [GitHub Repository](https://github.com/MasaiahHan/TraceLift)
+- **Paper:** [arXiv:2605.03862](https://huggingface.co/papers/2605.03862)
+## Training details
+Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights.
 - LoRA rank `32`, alpha `64`, dropout `0.05`.
 - Five rubric classification heads with CE dimension loss.
 - One total-score head with Huber loss on the normalized total score.
 - The released checkpoints already include the backbone, rubric heads, and total head.
+## Usage
+To use these models, you need the custom `reasonrm` package from the [official repository](https://github.com/MasaiahHan/TraceLift).
+```python
+import torch
+from transformers import AutoTokenizer
+from reasonrm.modeling_reward import Qwen2ForReasonRewardModel
+model = Qwen2ForReasonRewardModel.from_pretrained(
+    "ScottHan/TraceLift", # or path to local subdir like math-rm-full-ce
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+tokenizer = AutoTokenizer.from_pretrained("ScottHan/TraceLift")
+```
+## Citation
+```bibtex
+@misc{han2026correctisnotenough,
+  title={Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards},
+  author={Han, Tianyang and Shi, Hengyu and Hu, Junjie and Yang, Xu and Wang, Zhiling and Su, Junhao},
+  year={2026},
+  eprint={2605.03862},
+  archivePrefix={arXiv},
+  primaryClass={cs.AI},
+  url={https://arxiv.org/abs/2605.03862}
+}
+```