ShadeCloak commited on
Commit
d4c4733
·
verified ·
1 Parent(s): 08448cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -13,7 +13,7 @@ IP-GRM (Independent Principle Generative Reward Model) is a decoupled reward-mod
13
  | [IP-GRM](https://huggingface.co/IP-GRM/IP-GRM) | 16B generative reward model with decoupled principle-judgment pipeline |
14
  | [CreativeWriting-8B](https://huggingface.co/IP-GRM/CreativeWriting-8B) | 8B creative writing model trained via GRPO with IP-GRM rewards |
15
  | [IP-rewarding-8K](https://huggingface.co/datasets/IP-GRM/IP-rewarding-8K) | 8K decoupled reward SFT dataset (principle + judgment pairs) |
16
- | [Paper](https://arxiv.org/abs/2602.11111111) | arXiv preprint |
17
  | [Code](https://github.com/ShadeCloak/IP-GRM) | Training scripts and IP-GRM process functions |
18
 
19
  ## Key Idea
 
13
  | [IP-GRM](https://huggingface.co/IP-GRM/IP-GRM) | 16B generative reward model with decoupled principle-judgment pipeline |
14
  | [CreativeWriting-8B](https://huggingface.co/IP-GRM/CreativeWriting-8B) | 8B creative writing model trained via GRPO with IP-GRM rewards |
15
  | [IP-rewarding-8K](https://huggingface.co/datasets/IP-GRM/IP-rewarding-8K) | 8K decoupled reward SFT dataset (principle + judgment pairs) |
16
+ | [Paper](https://arxiv.org/abs/) | arXiv preprint |
17
  | [Code](https://github.com/ShadeCloak/IP-GRM) | Training scripts and IP-GRM process functions |
18
 
19
  ## Key Idea