Add pipeline_tag, library_name and paper metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +17 -10
README.md CHANGED
@@ -1,25 +1,32 @@
1
  ---
2
- license: mit
3
  base_model:
4
  - CodeGoat24/UnifiedReward-Think-qwen3vl-2b
5
  datasets:
6
  - CodeGoat24/UnifiedReward-Flex-SFT-90K
 
 
 
7
  ---
8
 
9
- # Model Summary
10
- **UnifiedReward-Flex-qwen3vl-2b** is a **unified personalized reward model for vision generation** that couples reward modeling with flexible and context-adaptive reasoning!!
 
11
 
12
- πŸš€ The inference code is available at [Github](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Flex).
13
 
 
14
 
15
- For further details, please refer to the following resources:
16
- - πŸ“° Paper: https://arxiv.org/abs/2602.02380
17
- - πŸͺ Project Page: https://codegoat24.github.io/UnifiedReward/flex
18
- - πŸ€— Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-flex
19
- - πŸ€— Dataset: https://huggingface.co/datasets/CodeGoat24/UnifiedReward-Flex-SFT-90K
20
- - πŸ‘‹ Point of Contact: [Yibin Wang](https://codegoat24.github.io)
21
 
 
22
 
 
 
 
 
 
23
 
24
  ## Citation
25
 
 
1
  ---
 
2
  base_model:
3
  - CodeGoat24/UnifiedReward-Think-qwen3vl-2b
4
  datasets:
5
  - CodeGoat24/UnifiedReward-Flex-SFT-90K
6
+ license: mit
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
  ---
10
 
11
+ # UnifiedReward-Flex-qwen3vl-2b
12
+
13
+ **UnifiedReward-Flex-qwen3vl-2b** is a unified personalized reward model for vision generation that couples reward modeling with flexible and context-adaptive reasoning.
14
 
15
+ The model was introduced in the paper [Unified Personalized Reward Model for Vision Generation](https://huggingface.co/papers/2602.02380).
16
 
17
+ ## Model Summary
18
 
19
+ UnifiedReward-Flex addresses the limitations of traditional "one-size-fits-all" reward models by dynamically constructing hierarchical assessments based on content-specific visual cues. It follows a two-stage training process:
20
+ 1. **SFT**: Distilling structured, high-quality reasoning traces from advanced closed-source VLMs to bootstrap Supervised Fine-Tuning, equipping the model with flexible and context-adaptive reasoning.
21
+ 2. **DPO**: Performing Direct Preference Optimization on carefully curated preference pairs to further strengthen reasoning fidelity and discriminative alignment.
 
 
 
22
 
23
+ ## Resources
24
 
25
+ - **πŸ“° Paper:** [Unified Personalized Reward Model for Vision Generation](https://huggingface.co/papers/2602.02380)
26
+ - **πŸͺ Project Page:** [https://codegoat24.github.io/UnifiedReward/flex](https://codegoat24.github.io/UnifiedReward/flex)
27
+ - **πŸš€ Code:** [GitHub Repository](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Flex)
28
+ - **πŸ€— Model Collections:** [UnifiedReward-Flex Collection](https://huggingface.co/collections/CodeGoat24/unifiedreward-flex)
29
+ - **πŸ€— Dataset:** [UnifiedReward-Flex-SFT-90K](https://huggingface.co/datasets/CodeGoat24/UnifiedReward-Flex-SFT-90K)
30
 
31
  ## Citation
32