| | --- |
| | library_name: diffusers |
| | pipeline_tag: text-to-image |
| | --- |
| | |
| | # TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward |
| |
|
| | <div align="center Lark"> |
| | <a href="https://luo-yihong.github.io/TDM-R1-Page/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a>   |
| | <a href="https://arxiv.org/abs/2603.07700"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:TDM-R1&color=red&logo=arxiv"></a>   |
| | <a href="https://github.com/Luo-Yihong/TDM-R1"><img src="https://img.shields.io/static/v1?label=Code&message=Github&color=green&logo=github"></a> |
| | </div> |
| |
|
| | This is the Official Repository of "[TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward](https://arxiv.org/abs/2603.07700)", by *Yihong Luo, Tianyang Hu, Weijian Luo, Jing Tang*. |
| |
|
| | <div align="center"> |
| | <img src="teaser_git.png" width="100%"> |
| | </div> |
| |
|
| | <p align="center"> |
| | Samples generated by <b>TDM-R1</b> using only <b>4 NFEs</b>, obtained by reinforcing the recent powerful Z-Image model. |
| | </p> |
| |
|
| | ## Description |
| | TDM-R1 is a reinforcement learning (RL) paradigm for few-step generative models. It decouples the learning process into surrogate reward learning and generator learning, allowing for the use of non-differentiable rewards (e.g., human preference, object counts). This repository contains the reinforced version of the [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) model. |
| |
|
| | ## Pre-trained Model |
| |
|
| | - [TDM-R1-ZImage](https://huggingface.co/Luo-Yihong/TDM-R1) |
| |
|
| | ## Usage |
| |
|
| | You can use this model with `diffusers` and `peft`. Below is an example of how to load the weights as a LoRA adapter. |
| |
|
| | ```python |
| | import os |
| | os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" |
| | import torch |
| | from diffusers import ZImagePipeline |
| | from peft import LoraConfig, get_peft_model |
| | |
| | def load_ema(pipeline, lora_path, adapter_name='default'): |
| | """Load EMA weights into the pipeline's transformer adapter""" |
| | pipeline.transformer.set_adapter(adapter_name) |
| | trainable_params = [ |
| | p for n, p in pipeline.transformer.named_parameters() |
| | if adapter_name in n and p.requires_grad |
| | ] |
| | state_dict = torch.load(lora_path, map_location=pipeline.transformer.device) |
| | ema_params = state_dict["ema_parameters"] |
| | assert len(trainable_params) == len(ema_params), \ |
| | f"Parameter count mismatch: {len(trainable_params)} vs {len(ema_params)}" |
| | for param, ema_param in zip(trainable_params, ema_params): |
| | param.data.copy_(ema_param.to(param.device)) |
| | print(f"Loaded EMA weights for adapter '{adapter_name}' from {lora_path}") |
| | |
| | pipeline = ZImagePipeline.from_pretrained( |
| | "Tongyi-MAI/Z-Image-Turbo", |
| | torch_dtype=torch.bfloat16, |
| | low_cpu_mem_usage=False, |
| | ) |
| | transformer_lora_config = LoraConfig( |
| | r=32, |
| | lora_alpha=64, |
| | init_lora_weights="gaussian", |
| | target_modules=["to_q", "to_k", "to_v", "to_out.0", "add_k_proj", "add_v_proj"], |
| | ) |
| | pipeline.transformer = get_peft_model( |
| | pipeline.transformer, |
| | transformer_lora_config, |
| | adapter_name="tdmr1", |
| | ) |
| | # Ensure the checkpoint file is downloaded locally |
| | load_ema( |
| | pipeline, |
| | lora_path="./tdmr1_zimage_ema.ckpt", |
| | adapter_name="tdmr1", |
| | ) |
| | pipeline = pipeline.to("cuda") |
| | image = pipeline( |
| | prompt="A high quality photo of a cat", |
| | height=1024, |
| | width=1024, |
| | num_inference_steps=5, # This actually results in 4 DiT forwards |
| | guidance_scale=0.0, |
| | generator=torch.Generator("cuda").manual_seed(42), |
| | ).images[0] |
| | image |
| | ``` |
| |
|
| | ## Contact |
| |
|
| | Please contact Yihong Luo (yluocg@connect.ust.hk) if you have any questions about this work. |
| |
|
| | ## Bibtex |
| | ```bibtex |
| | @misc{luo2025tdmr1, |
| | title={TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward}, |
| | author={Yihong Luo and Tianyang Hu and Weijian Luo and Jing Tang}, |
| | year={2025}, |
| | eprint={2603.07700}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV} |
| | } |
| | ``` |