ModalityDance
/

MRM-Reddit150-V1

Add pipeline tag, library name, and paper link to metadata

by nielsr HF Staff - opened Jan 28

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,12 +1,14 @@
 ---
-license: mit
-datasets:
-- openai/summarize_from_feedback
 base_model:
 - Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
 ---
 # Meta Reward Modeling (MRM)
 ## Overview
@@ -17,16 +19,16 @@ Instead of learning a single global reward function, MRM treats each user as a s
 MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
 To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
-This repository provides trained checkpoints for reward modeling and user-level preference evaluation.
 ---
 ## Links
-- 📄 **arXiv Paper**: https://arxiv.org/abs/2601.18731
-- 🤗 **Hugging Face Paper**: https://huggingface.co/papers/2601.18731
-- 💻 **GitHub Code**: https://github.com/ModalityDance/MRM
-- 📦 **Hugging Face Collection**: https://huggingface.co/collections/ModalityDance/mrm
 ---
@@ -171,4 +173,4 @@ If you use this model or code in your research, please cite:
 ## License
-This model is released under the **MIT License**.

 ---
 base_model:
 - Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
+datasets:
+- openai/summarize_from_feedback
+license: mit
+pipeline_tag: text-classification
+library_name: transformers
+arxiv: 2601.18731
 ---
 # Meta Reward Modeling (MRM)
 ## Overview
 MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
 To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
+This repository provides trained checkpoints for reward modeling and user-level preference evaluation as presented in the paper [One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment](https://huggingface.co/papers/2601.18731).
 ---
 ## Links
+- 📄 **arXiv Paper**: [2601.18731](https://arxiv.org/abs/2601.18731)
+- 🤗 **Hugging Face Paper**: [2601.18731](https://huggingface.co/papers/2601.18731)
+- 💻 **GitHub Code**: [ModalityDance/MRM](https://github.com/ModalityDance/MRM)
+- 📦 **Hugging Face Collection**: [MRM Collection](https://huggingface.co/collections/ModalityDance/mrm)
 ---
 ## License
+This model is released under the **MIT License**.