ModalityDance
/

MRM-PRISM-V2

Add pipeline_tag to model card

by nielsr HF Staff - opened Jan 28

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,22 +1,21 @@
 ---
-license: mit
-datasets:
-- HannahRoseKirk/prism-alignment
 base_model:
 - Skywork/Skywork-Reward-V2-Llama-3.1-8B
 ---
 # Meta Reward Modeling (MRM)
 ## Overview
-**Meta Reward Modeling (MRM)** is a personalized reward modeling framework designed to adapt to diverse user preferences with limited feedback.
-Instead of learning a single global reward function, MRM treats each user as a separate learning task and applies a meta-learning approach to learn a shared initialization that enables fast, few-shot personalization.
-MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
-To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
-This repository provides trained checkpoints for reward modeling and user-level preference evaluation.
 ---
@@ -170,4 +169,4 @@ If you use this model or code in your research, please cite:
 ## License
-This model is released under the **MIT License**.

 ---
 base_model:
 - Skywork/Skywork-Reward-V2-Llama-3.1-8B
+datasets:
+- HannahRoseKirk/prism-alignment
+license: mit
+pipeline_tag: text-classification
 ---
 # Meta Reward Modeling (MRM)
 ## Overview
+**Meta Reward Modeling (MRM)** is a personalized reward modeling framework designed to adapt to diverse user preferences with limited feedback. This repository provides trained checkpoints as described in the paper [One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment](https://huggingface.co/papers/2601.18731).
+Instead of learning a single global reward function, MRM treats each user as a separate learning task and applies a meta-learning approach to learn a shared initialization that enables fast, few-shot personalization.
+MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework. To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
 ---
 ## License
+This model is released under the **MIT License**.