Text Classification
Transformers

Add pipeline tag, library name, and paper link to metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +12 -10
README.md CHANGED
@@ -1,12 +1,14 @@
1
  ---
2
- license: mit
3
- datasets:
4
- - openai/summarize_from_feedback
5
  base_model:
6
  - Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
 
 
 
 
 
 
7
  ---
8
 
9
-
10
  # Meta Reward Modeling (MRM)
11
 
12
  ## Overview
@@ -17,16 +19,16 @@ Instead of learning a single global reward function, MRM treats each user as a s
17
  MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
18
  To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
19
 
20
- This repository provides trained checkpoints for reward modeling and user-level preference evaluation.
21
 
22
  ---
23
 
24
  ## Links
25
 
26
- - πŸ“„ **arXiv Paper**: https://arxiv.org/abs/2601.18731
27
- - πŸ€— **Hugging Face Paper**: https://huggingface.co/papers/2601.18731
28
- - πŸ’» **GitHub Code**: https://github.com/ModalityDance/MRM
29
- - πŸ“¦ **Hugging Face Collection**: https://huggingface.co/collections/ModalityDance/mrm
30
 
31
  ---
32
 
@@ -171,4 +173,4 @@ If you use this model or code in your research, please cite:
171
 
172
  ## License
173
 
174
- This model is released under the **MIT License**.
 
1
  ---
 
 
 
2
  base_model:
3
  - Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
4
+ datasets:
5
+ - openai/summarize_from_feedback
6
+ license: mit
7
+ pipeline_tag: text-classification
8
+ library_name: transformers
9
+ arxiv: 2601.18731
10
  ---
11
 
 
12
  # Meta Reward Modeling (MRM)
13
 
14
  ## Overview
 
19
  MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
20
  To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
21
 
22
+ This repository provides trained checkpoints for reward modeling and user-level preference evaluation as presented in the paper [One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment](https://huggingface.co/papers/2601.18731).
23
 
24
  ---
25
 
26
  ## Links
27
 
28
+ - πŸ“„ **arXiv Paper**: [2601.18731](https://arxiv.org/abs/2601.18731)
29
+ - πŸ€— **Hugging Face Paper**: [2601.18731](https://huggingface.co/papers/2601.18731)
30
+ - πŸ’» **GitHub Code**: [ModalityDance/MRM](https://github.com/ModalityDance/MRM)
31
+ - πŸ“¦ **Hugging Face Collection**: [MRM Collection](https://huggingface.co/collections/ModalityDance/mrm)
32
 
33
  ---
34
 
 
173
 
174
  ## License
175
 
176
+ This model is released under the **MIT License**.