ScottHan
/

TraceLift

Model card Files Files and versions

TraceLift / code-rm-full-ce /README.md

ScottHan's picture

Upload folder using huggingface_hub

c1c9060 verified about 15 hours ago

|

history blame contribute delete

547 Bytes

TraceLift Code Reason RM Full Checkpoint

This is the released code-domain TraceLift Reason RM.

Base initialization: Qwen/Qwen2.5-7B-Instruct
Training: LoRA, merged into full weights
LoRA rank: 32
LoRA alpha: 64
LoRA dropout: 0.05
Rubric heads: five 5-way classification heads
Total head: one scalar regression head
Dimension loss: cross entropy
Total loss: Huber loss on the normalized total score

The directory can be loaded directly with reasonrm.modeling_reward.Qwen2ForReasonRewardModel.from_pretrained(...).