Text Classification
Transformers
HongruCai commited on
Commit
fdc7320
·
verified ·
1 Parent(s): f884015

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -1
README.md CHANGED
@@ -4,4 +4,73 @@ datasets:
4
  - HannahRoseKirk/prism-alignment
5
  base_model:
6
  - Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - HannahRoseKirk/prism-alignment
5
  base_model:
6
  - Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
7
+ ---
8
+
9
+
10
+ # Meta Reward Modeling (MRM)
11
+
12
+ ## Overview
13
+
14
+ **Meta Reward Modeling (MRM)** is a personalized reward modeling framework designed to adapt to diverse user preferences with limited feedback.
15
+ Instead of learning a single global reward function, MRM treats each user as a separate learning task and applies a meta-learning approach to learn a shared initialization that enables fast, few-shot personalization.
16
+
17
+ MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
18
+ To improve robustness across heterogeneous users, MRM introduces a **Robust Personalization Objective (RPO)** that emphasizes hard-to-learn users during meta-training.
19
+
20
+ This repository provides trained checkpoints for reward modeling and user-level preference evaluation.
21
+
22
+ ---
23
+
24
+ ## Links
25
+
26
+ - 📄 **arXiv Paper**: https://arxiv.org/abs/XXXX.XXXXX
27
+ - 🤗 **Hugging Face Paper**: https://huggingface.co/papers/XXXX.XXXXX
28
+ - 💻 **GitHub Code**: https://github.com/ModalityDance/MRM
29
+ - 📦 **Hugging Face Collection**: https://huggingface.co/collections/ModalityDance/mrm
30
+
31
+ ---
32
+
33
+ ## Evaluation
34
+
35
+ The model is evaluated using user-level preference accuracy with few-shot personalization.
36
+ Inference follows the same adaptation procedure used during training: for each user, the reward weights are initialized from the meta-learned initialization and updated with a small number of gradient steps on user-specific preference data.
37
+
38
+ ### Example evaluation script
39
+
40
+ ```bash
41
+ python inference.py \
42
+ --embed_pt data/emb/prism/V1.pt \
43
+ --meta_json data/emb/prism/V1.json \
44
+ --ckpt path/to/checkpoint.pt \
45
+ --dataset PRISM \
46
+ --seen_train_limit -1 \
47
+ --unseen_train_limit -1 \
48
+ --hidden_layers 2 \
49
+ --inner_lr 1e-3 \
50
+ --eval_inner_epochs 1 \
51
+ --val_ratio 0.9 \
52
+ --score_threshold -1 \
53
+ --seed 42 \
54
+ --device cuda:0
55
+ ````
56
+
57
+ ---
58
+
59
+ ## Citation
60
+
61
+ If you use this model or code in your research, please cite:
62
+
63
+ ```bibtex
64
+ @article{mrm2025,
65
+ title = {Meta Reward Modeling for Personalized Alignment},
66
+ author = {Author Names},
67
+ journal = {arXiv preprint arXiv:XXXX.XXXXX},
68
+ year = {2025}
69
+ }
70
+ ```
71
+
72
+ ---
73
+
74
+ ## License
75
+
76
+ This model is released under the **MIT License**.