ScottHan
/

TraceLift

Model card Files Files and versions

TraceLift / code-rm-full-ce /README.md

ScottHan's picture

Upload folder using huggingface_hub

c1c9060 verified 1 day ago

|

history blame contribute delete

547 Bytes

	# TraceLift Code Reason RM Full Checkpoint

	This is the released code-domain TraceLift Reason RM.

	- Base initialization: `Qwen/Qwen2.5-7B-Instruct`
	- Training: LoRA, merged into full weights
	- LoRA rank: `32`
	- LoRA alpha: `64`
	- LoRA dropout: `0.05`
	- Rubric heads: five 5-way classification heads
	- Total head: one scalar regression head
	- Dimension loss: cross entropy
	- Total loss: Huber loss on the normalized total score

	The directory can be loaded directly with `reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...)`.