Add pipeline tag, library name, and paper link to metadata
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,12 +1,14 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
-
datasets:
|
| 4 |
-
- openai/summarize_from_feedback
|
| 5 |
base_model:
|
| 6 |
- Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
-
|
| 10 |
# Meta Reward Modeling (MRM)
|
| 11 |
|
| 12 |
## Overview
|
|
@@ -17,16 +19,16 @@ Instead of learning a single global reward function, MRM treats each user as a s
|
|
| 17 |
MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
|
| 18 |
To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
|
| 19 |
|
| 20 |
-
This repository provides trained checkpoints for reward modeling and user-level preference evaluation.
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
## Links
|
| 25 |
|
| 26 |
-
- π **arXiv Paper**: https://arxiv.org/abs/2601.18731
|
| 27 |
-
- π€ **Hugging Face Paper**: https://huggingface.co/papers/2601.18731
|
| 28 |
-
- π» **GitHub Code**: https://github.com/ModalityDance/MRM
|
| 29 |
-
- π¦ **Hugging Face Collection**: https://huggingface.co/collections/ModalityDance/mrm
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
@@ -171,4 +173,4 @@ If you use this model or code in your research, please cite:
|
|
| 171 |
|
| 172 |
## License
|
| 173 |
|
| 174 |
-
This model is released under the **MIT License**.
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
|
| 4 |
+
datasets:
|
| 5 |
+
- openai/summarize_from_feedback
|
| 6 |
+
license: mit
|
| 7 |
+
pipeline_tag: text-classification
|
| 8 |
+
library_name: transformers
|
| 9 |
+
arxiv: 2601.18731
|
| 10 |
---
|
| 11 |
|
|
|
|
| 12 |
# Meta Reward Modeling (MRM)
|
| 13 |
|
| 14 |
## Overview
|
|
|
|
| 19 |
MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
|
| 20 |
To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
|
| 21 |
|
| 22 |
+
This repository provides trained checkpoints for reward modeling and user-level preference evaluation as presented in the paper [One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment](https://huggingface.co/papers/2601.18731).
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
## Links
|
| 27 |
|
| 28 |
+
- π **arXiv Paper**: [2601.18731](https://arxiv.org/abs/2601.18731)
|
| 29 |
+
- π€ **Hugging Face Paper**: [2601.18731](https://huggingface.co/papers/2601.18731)
|
| 30 |
+
- π» **GitHub Code**: [ModalityDance/MRM](https://github.com/ModalityDance/MRM)
|
| 31 |
+
- π¦ **Hugging Face Collection**: [MRM Collection](https://huggingface.co/collections/ModalityDance/mrm)
|
| 32 |
|
| 33 |
---
|
| 34 |
|
|
|
|
| 173 |
|
| 174 |
## License
|
| 175 |
|
| 176 |
+
This model is released under the **MIT License**.
|