Improve model card: Add pipeline tag, update library_name, and fix abstract typo
Browse filesThis PR enhances the model card by:
- Adding the `pipeline_tag: text-to-image` to improve model discoverability under the relevant pipeline filter on the Hub.
- Updating the `library_name` from `SRPO` to `diffusers`, as evidenced by the `from diffusers import FluxPipeline` import in the provided inference snippet. This will enable the automated "how to use" widget on the Hub.
- Fixing a minor typo in the section header `
README.md
CHANGED
|
@@ -1,15 +1,16 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
|
|
|
| 3 |
license: other
|
| 4 |
license_name: tencent-hunyuan-community
|
| 5 |
license_link: https://github.com/Tencent-Hunyuan/SRPO/blob/main/LICENSE.txt
|
| 6 |
-
|
| 7 |
-
- tencent/SRPO
|
| 8 |
---
|
| 9 |
|
| 10 |
## bf16 and fp8 versions of SRPO from Tencent
|
| 11 |
|
| 12 |
-
<div align
|
| 13 |
<h1 align="center">Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference </h1>
|
| 14 |
<div align="center">
|
| 15 |
<a href='https://arxiv.org/abs/2509.06942'><img src='https://img.shields.io/badge/ArXiv-red?logo=arxiv'></a>
|
|
@@ -41,7 +42,7 @@ base_model:
|
|
| 41 |
|
| 42 |
|
| 43 |
|
| 44 |
-
##
|
| 45 |
Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
|
| 46 |
|
| 47 |
## Quick Started
|
|
@@ -81,7 +82,7 @@ If you use SRPO for your research, please cite our paper:
|
|
| 81 |
year={2025},
|
| 82 |
eprint={2509.06942},
|
| 83 |
archivePrefix={arXiv},
|
| 84 |
-
primaryClass={cs.AI}
|
| 85 |
url={https://arxiv.org/abs/2509.06942},
|
| 86 |
}
|
| 87 |
```
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- tencent/SRPO
|
| 4 |
+
library_name: diffusers
|
| 5 |
license: other
|
| 6 |
license_name: tencent-hunyuan-community
|
| 7 |
license_link: https://github.com/Tencent-Hunyuan/SRPO/blob/main/LICENSE.txt
|
| 8 |
+
pipeline_tag: text-to-image
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
## bf16 and fp8 versions of SRPO from Tencent
|
| 12 |
|
| 13 |
+
<div align="center" style="font-family: charter;">
|
| 14 |
<h1 align="center">Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference </h1>
|
| 15 |
<div align="center">
|
| 16 |
<a href='https://arxiv.org/abs/2509.06942'><img src='https://img.shields.io/badge/ArXiv-red?logo=arxiv'></a>
|
|
|
|
| 42 |
|
| 43 |
|
| 44 |
|
| 45 |
+
## Abstract
|
| 46 |
Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
|
| 47 |
|
| 48 |
## Quick Started
|
|
|
|
| 82 |
year={2025},
|
| 83 |
eprint={2509.06942},
|
| 84 |
archivePrefix={arXiv},
|
| 85 |
+
primaryClass={cs.AI},\
|
| 86 |
url={https://arxiv.org/abs/2509.06942},
|
| 87 |
}
|
| 88 |
```
|