Add pipeline tag, library name, and paper link to metadata
Browse filesHi! I'm Niels, part of the community science team at Hugging Face.
This PR improves the model card by adding:
- `pipeline_tag: text-classification`: This helps users find the model under the correct task category (Reward Modeling is typically classified as text classification on the Hub).
- `library_name: transformers`: Based on the usage example, the model is compatible with the `transformers` library.
- `arxiv: 2601.18731`: This links the model repository to its official paper on the Hugging Face Hub.
Best,
Niels
README.md
CHANGED
|
@@ -1,12 +1,14 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
-
datasets:
|
| 4 |
-
- openai/summarize_from_feedback
|
| 5 |
base_model:
|
| 6 |
- Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
-
|
| 10 |
# Meta Reward Modeling (MRM)
|
| 11 |
|
| 12 |
## Overview
|
|
@@ -17,16 +19,16 @@ Instead of learning a single global reward function, MRM treats each user as a s
|
|
| 17 |
MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
|
| 18 |
To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
|
| 19 |
|
| 20 |
-
This repository provides trained checkpoints for reward modeling and user-level preference evaluation.
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
## Links
|
| 25 |
|
| 26 |
-
- 📄 **arXiv Paper**: https://arxiv.org/abs/2601.18731
|
| 27 |
-
- 🤗 **Hugging Face Paper**: https://huggingface.co/papers/2601.18731
|
| 28 |
-
- 💻 **GitHub Code**: https://github.com/ModalityDance/MRM
|
| 29 |
-
- 📦 **Hugging Face Collection**: https://huggingface.co/collections/ModalityDance/mrm
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
@@ -171,4 +173,4 @@ If you use this model or code in your research, please cite:
|
|
| 171 |
|
| 172 |
## License
|
| 173 |
|
| 174 |
-
This model is released under the **MIT License**.
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
|
| 4 |
+
datasets:
|
| 5 |
+
- openai/summarize_from_feedback
|
| 6 |
+
license: mit
|
| 7 |
+
pipeline_tag: text-classification
|
| 8 |
+
library_name: transformers
|
| 9 |
+
arxiv: 2601.18731
|
| 10 |
---
|
| 11 |
|
|
|
|
| 12 |
# Meta Reward Modeling (MRM)
|
| 13 |
|
| 14 |
## Overview
|
|
|
|
| 19 |
MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
|
| 20 |
To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
|
| 21 |
|
| 22 |
+
This repository provides trained checkpoints for reward modeling and user-level preference evaluation as presented in the paper [One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment](https://huggingface.co/papers/2601.18731).
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
## Links
|
| 27 |
|
| 28 |
+
- 📄 **arXiv Paper**: [2601.18731](https://arxiv.org/abs/2601.18731)
|
| 29 |
+
- 🤗 **Hugging Face Paper**: [2601.18731](https://huggingface.co/papers/2601.18731)
|
| 30 |
+
- 💻 **GitHub Code**: [ModalityDance/MRM](https://github.com/ModalityDance/MRM)
|
| 31 |
+
- 📦 **Hugging Face Collection**: [MRM Collection](https://huggingface.co/collections/ModalityDance/mrm)
|
| 32 |
|
| 33 |
---
|
| 34 |
|
|
|
|
| 173 |
|
| 174 |
## License
|
| 175 |
|
| 176 |
+
This model is released under the **MIT License**.
|