Text Classification
nielsr HF Staff commited on
Commit
7e25d96
·
verified ·
1 Parent(s): 077cbcb

Add pipeline_tag to model card

Browse files

Hi! I'm Niels from the Hugging Face community team.

This PR adds the `text-classification` pipeline tag to the model card metadata. This ensures the model is correctly categorized on the Hugging Face Hub as a reward model/classifier, making it easier for users to discover.

I've also added a reference to the original paper in the overview section. The rest of the model card (links, evaluation, and usage snippets) looks great!

Files changed (1) hide show
  1. README.md +8 -9
README.md CHANGED
@@ -1,22 +1,21 @@
1
  ---
2
- license: mit
3
- datasets:
4
- - HannahRoseKirk/prism-alignment
5
  base_model:
6
  - Skywork/Skywork-Reward-V2-Llama-3.1-8B
 
 
 
 
7
  ---
8
 
9
  # Meta Reward Modeling (MRM)
10
 
11
  ## Overview
12
 
13
- **Meta Reward Modeling (MRM)** is a personalized reward modeling framework designed to adapt to diverse user preferences with limited feedback.
14
- Instead of learning a single global reward function, MRM treats each user as a separate learning task and applies a meta-learning approach to learn a shared initialization that enables fast, few-shot personalization.
15
 
16
- MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
17
- To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
18
 
19
- This repository provides trained checkpoints for reward modeling and user-level preference evaluation.
20
 
21
  ---
22
 
@@ -170,4 +169,4 @@ If you use this model or code in your research, please cite:
170
 
171
  ## License
172
 
173
- This model is released under the **MIT License**.
 
1
  ---
 
 
 
2
  base_model:
3
  - Skywork/Skywork-Reward-V2-Llama-3.1-8B
4
+ datasets:
5
+ - HannahRoseKirk/prism-alignment
6
+ license: mit
7
+ pipeline_tag: text-classification
8
  ---
9
 
10
  # Meta Reward Modeling (MRM)
11
 
12
  ## Overview
13
 
14
+ **Meta Reward Modeling (MRM)** is a personalized reward modeling framework designed to adapt to diverse user preferences with limited feedback. This repository provides trained checkpoints as described in the paper [One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment](https://huggingface.co/papers/2601.18731).
 
15
 
16
+ Instead of learning a single global reward function, MRM treats each user as a separate learning task and applies a meta-learning approach to learn a shared initialization that enables fast, few-shot personalization.
 
17
 
18
+ MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework. To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
19
 
20
  ---
21
 
 
169
 
170
  ## License
171
 
172
+ This model is released under the **MIT License**.