gaotang commited on
Commit
8767a55
·
verified ·
1 Parent(s): 8a789a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -18,8 +18,7 @@ library_name: transformers
18
 
19
  # 🚀 Can we cast reward modeling as a reasoning task?
20
 
21
- **RM-R1** is a training framework for *Reasoning Reward Model* (ReasRM) that judges two candidate answers by first **thinking out loud**—generating structured rubrics or reasoning traces—then emitting its preference.
22
- Compared to traditional scalar or generative reward models, RM-R1 delivers **state-of-the-art performance** on public RM benchmarks while offering fully interpretable justifications.
23
 
24
 
25
  ## 🧠 TL;DR
@@ -106,7 +105,7 @@ INSTRUCT_SINGLE_USER_PROMPT_TEMPLATE = (
106
  import torch
107
  from transformers import AutoModelForCausalLM, AutoTokenizer
108
 
109
- model_name = "gaotang/RM-R1-Qwen2.5-Instruct-32B"
110
  model = AutoModelForCausalLM.from_pretrained(
111
  model_name,
112
  torch_dtype="auto",
 
18
 
19
  # 🚀 Can we cast reward modeling as a reasoning task?
20
 
21
+ **RM-R1** is a training framework for *Reasoning Reward Model* (ReasRM) that judges two candidate answers by first **thinking out loud**—generating structured rubrics or reasoning traces—then emitting its preference. Compared to traditional scalar or generative reward models, RM-R1 delivers **state-of-the-art performance** on public RM benchmarks on average while offering fully interpretable justifications.
 
22
 
23
 
24
  ## 🧠 TL;DR
 
105
  import torch
106
  from transformers import AutoModelForCausalLM, AutoTokenizer
107
 
108
+ model_name = "gaotang/RM-R1-Qwen2.5-Instruct-14B"
109
  model = AutoModelForCausalLM.from_pretrained(
110
  model_name,
111
  torch_dtype="auto",