gaotang commited on
Commit
df58831
·
verified ·
1 Parent(s): 6cab559

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -18,7 +18,7 @@ library_name: transformers
18
 
19
  # 🚀 Can we cast reward modeling as a reasoning task?
20
 
21
- **RM-R1** is a training framework for *Reasoning Reward Model* (ReasRM) that judges two candidate answers by first **thinking out loud**—generating structured rubrics or reasoning traces—then emitting its preference. Compared to traditional scalar or generative reward models, RM-R1 delivers **state-of-the-art performance** on public RM benchmarks while offering fully interpretable justifications.
22
 
23
 
24
  ## 🧠 TL;DR
 
18
 
19
  # 🚀 Can we cast reward modeling as a reasoning task?
20
 
21
+ **RM-R1** is a training framework for *Reasoning Reward Model* (ReasRM) that judges two candidate answers by first **thinking out loud**—generating structured rubrics or reasoning traces—then emitting its preference. Compared to traditional scalar or generative reward models, RM-R1 delivers **state-of-the-art performance** on public RM benchmarks on average while offering fully interpretable justifications.
22
 
23
 
24
  ## 🧠 TL;DR