ssssmark commited on
Commit
29cea69
·
verified ·
1 Parent(s): e73400d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -1
README.md CHANGED
@@ -8,4 +8,110 @@ metrics:
8
  base_model:
9
  - Qwen/Qwen2.5-VL-7B-Instruct
10
  pipeline_tag: reinforcement-learning
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  base_model:
9
  - Qwen/Qwen2.5-VL-7B-Instruct
10
  pipeline_tag: reinforcement-learning
11
+ ---
12
+
13
+
14
+ <div align="center">
15
+
16
+ # Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
17
+ <a href="https://arxiv.org/pdf/2509.21871" target="_blank">
18
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-Aes--R1-red?logo=arxiv" height="25" />
19
+ </a>
20
+ <a href="https://huggingface.co/ssssmark/Aes-R1" target="_blank">
21
+ <img alt="HF Model: Aes-R1" src="https://img.shields.io/badge/%F0%9F%A4%97%20Model-Aes--R1-ffc107" height="25" />
22
+ </a>
23
+ <a href="https://huggingface.co/TianheWu/VisualQuality-R1-7B-preview" target="_blank">
24
+ <img alt="HF Dataset : Aes-CoT" src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Aes--CoT-ffc107" height="25" />
25
+ </a>
26
+ </div>
27
+
28
+
29
+ > A novel and effective reinforcement learning framework designed for Image Aesthetic Assessment and general open-ended preference evaluation.
30
+
31
+
32
+ # 🖥️Training
33
+ ## Preparation
34
+ 1. First download the IAA datasets(AVA,TAD66K,AADB,PARA...) and place them all in a single folder.
35
+ 2. Construct your image-score dataset in the following format:
36
+ ```json
37
+ {
38
+ "messages": [
39
+ {
40
+ "content": "prompt here",
41
+ "role": "user"
42
+ },
43
+ {
44
+ "content": "response here",
45
+ "role": "assistant"
46
+ }
47
+ ],
48
+ "images": "image_path_1"
49
+ },
50
+ ```
51
+ we provide an example dataset in `AesR1/data` folder.
52
+ 3. Download the pre-trained model weights from [here](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) and place them in `AesR1/models`
53
+
54
+ ## Cold-start
55
+ We use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) to train the SFT model.
56
+
57
+ 1. Clone the [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) repository and install the dependencies.
58
+
59
+ ```bash
60
+ git clone https://github.com/hiyouga/LLaMA-Factory.git
61
+ conda create -n coldstart python=3.11.10
62
+ conda activate coldstart
63
+ cd LLaMA-Factory
64
+ pip install -e ".[torch,metrics]"
65
+ ```
66
+ 2. Put your cot dataset info in `LLaMA-Factory/data/dataset_info.json` and move `qwen_aescot.yaml` into `LLaMA-Factory/examples/train_full`
67
+ 3. Run the following command to train the SFT model.
68
+
69
+ ```bash
70
+ llamafactory-cli train examples/train_full/qwen_aescot.yaml
71
+ ```
72
+
73
+ ## RAPO
74
+ First setup the environment for RAPO training.
75
+ ```bash
76
+ conda create -n rapo python=3.11.10
77
+ conda activate rapo
78
+ bash setup.sh
79
+ ```
80
+ After modification, run the following command to train the RAPO model.
81
+ ```bash
82
+ # For single node training
83
+ bash train/rapo/src/open-r1-multimodal/run_scripts/Aes/aes_onenode.sh
84
+
85
+ # For multi node training
86
+ bash train/rapo/src/open-r1-multimodal/run_scripts/Aes/aes_multinode.sh
87
+ ```
88
+
89
+ # Inference
90
+ After training, you can inference the model by using the scripts in LLaMA-Factory.
91
+
92
+ ```bash
93
+ #Install vllm
94
+ pip install vllm
95
+
96
+ #Infer
97
+ python scripts/vllm_infer.py \
98
+ --model_name_or_path [path/to/your/model] \
99
+ --dataset [dataset_name] \
100
+ --template qwen2_vl \
101
+ --save_name result.jsonl \
102
+ --temperature 0.6 \
103
+ ```
104
+
105
+ # 📚 Citation
106
+ If you find this repo useful, please consider citing our paper as follows:
107
+ ```
108
+ @misc{liu2025unlockingessencebeautyadvanced,
109
+ title={Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization},
110
+ author={Boyang Liu and Yifan Hu and Senjie Jin and Shihan Dou and Gonglei Shi and Jie Shao and Tao Gui and Xuanjing Huang},
111
+ year={2025},
112
+ eprint={2509.21871},
113
+ archivePrefix={arXiv},
114
+ primaryClass={cs.CV},
115
+ url={https://arxiv.org/abs/2509.21871},
116
+ }
117
+ ```