Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ IP-GRM (Independent Principle Generative Reward Model) is a decoupled reward-mod
|
|
| 13 |
| [IP-GRM](https://huggingface.co/IP-GRM/IP-GRM) | 16B generative reward model with decoupled principle-judgment pipeline |
|
| 14 |
| [CreativeWriting-8B](https://huggingface.co/IP-GRM/CreativeWriting-8B) | 8B creative writing model trained via GRPO with IP-GRM rewards |
|
| 15 |
| [IP-rewarding-8K](https://huggingface.co/datasets/IP-GRM/IP-rewarding-8K) | 8K decoupled reward SFT dataset (principle + judgment pairs) |
|
| 16 |
-
| [Paper](https://arxiv.org/abs/
|
| 17 |
| [Code](https://github.com/ShadeCloak/IP-GRM) | Training scripts and IP-GRM process functions |
|
| 18 |
|
| 19 |
## Key Idea
|
|
|
|
| 13 |
| [IP-GRM](https://huggingface.co/IP-GRM/IP-GRM) | 16B generative reward model with decoupled principle-judgment pipeline |
|
| 14 |
| [CreativeWriting-8B](https://huggingface.co/IP-GRM/CreativeWriting-8B) | 8B creative writing model trained via GRPO with IP-GRM rewards |
|
| 15 |
| [IP-rewarding-8K](https://huggingface.co/datasets/IP-GRM/IP-rewarding-8K) | 8K decoupled reward SFT dataset (principle + judgment pairs) |
|
| 16 |
+
| [Paper](https://arxiv.org/abs/) | arXiv preprint |
|
| 17 |
| [Code](https://github.com/ShadeCloak/IP-GRM) | Training scripts and IP-GRM process functions |
|
| 18 |
|
| 19 |
## Key Idea
|