ScottHan
/

TraceLift

Model card Files Files and versions

TraceLift / README.md

ScottHan's picture

Upload folder using huggingface_hub

c1c9060 verified 3 days ago

|

history blame contribute delete

728 Bytes

	# Released TraceLift Reason RMs

	This directory contains two ready-to-load full Reward Model checkpoints:

	- `code-rm-full-ce`: code-domain Reason RM.
	- `math-rm-full-ce`: math-domain Reason RM.

	Both checkpoints were initialized from `Qwen/Qwen2.5-7B-Instruct`, trained with LoRA, and then merged into full `Qwen2ForReasonRewardModel` weights.

	Training details:

	- LoRA rank `32`, alpha `64`, dropout `0.05`.
	- Five rubric classification heads with CE dimension loss.
	- One total-score head with Huber loss on the normalized total score.
	- The released checkpoints already include the backbone, rubric heads, and total head.

	Load them directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`.