Improve model card: add metadata, paper information and usage snippet

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +42 -4
README.md CHANGED
@@ -1,18 +1,56 @@
 
 
 
 
1
  # Released TraceLift Reason RMs
2
 
3
- This directory contains two ready-to-load full Reward Model checkpoints:
4
 
5
  - `code-rm-full-ce`: code-domain Reason RM.
6
  - `math-rm-full-ce`: math-domain Reason RM.
7
 
8
- Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights.
 
 
 
9
 
10
- Training details:
 
 
11
 
12
  - LoRA rank `32`, alpha `64`, dropout `0.05`.
13
  - Five rubric classification heads with CE dimension loss.
14
  - One total-score head with Huber loss on the normalized total score.
15
  - The released checkpoints already include the backbone, rubric heads, and total head.
16
 
17
- Load them directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-classification
3
+ ---
4
+
5
  # Released TraceLift Reason RMs
6
 
7
+ This directory contains two ready-to-load full Reward Model checkpoints introduced in the paper [Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards](https://huggingface.co/papers/2605.03862):
8
 
9
  - `code-rm-full-ce`: code-domain Reason RM.
10
  - `math-rm-full-ce`: math-domain Reason RM.
11
 
12
+ TraceLift is a planner-executor training framework that treats reasoning as a consumable intermediate artifact, using executor-grounded rewards to shape reasoning traces.
13
+
14
+ - **Code:** [GitHub Repository](https://github.com/MasaiahHan/TraceLift)
15
+ - **Paper:** [arXiv:2605.03862](https://huggingface.co/papers/2605.03862)
16
 
17
+ ## Training details
18
+
19
+ Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights.
20
 
21
  - LoRA rank `32`, alpha `64`, dropout `0.05`.
22
  - Five rubric classification heads with CE dimension loss.
23
  - One total-score head with Huber loss on the normalized total score.
24
  - The released checkpoints already include the backbone, rubric heads, and total head.
25
 
26
+ ## Usage
27
+
28
+ To use these models, you need the custom `reasonrm` package from the [official repository](https://github.com/MasaiahHan/TraceLift).
29
+
30
+ ```python
31
+ import torch
32
+ from transformers import AutoTokenizer
33
+
34
+ from reasonrm.modeling_reward import Qwen2ForReasonRewardModel
35
+
36
+ model = Qwen2ForReasonRewardModel.from_pretrained(
37
+ "ScottHan/TraceLift", # or path to local subdir like math-rm-full-ce
38
+ torch_dtype=torch.bfloat16,
39
+ device_map="auto",
40
+ )
41
+ tokenizer = AutoTokenizer.from_pretrained("ScottHan/TraceLift")
42
+ ```
43
+
44
+ ## Citation
45
 
46
+ ```bibtex
47
+ @misc{han2026correctisnotenough,
48
+ title={Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards},
49
+ author={Han, Tianyang and Shi, Hengyu and Hu, Junjie and Yang, Xu and Wang, Zhiling and Su, Junhao},
50
+ year={2026},
51
+ eprint={2605.03862},
52
+ archivePrefix={arXiv},
53
+ primaryClass={cs.AI},
54
+ url={https://arxiv.org/abs/2605.03862}
55
+ }
56
+ ```