| # TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward |
| <div align="center"> |
| <a href="https://luo-yihong.github.io/TDM-R1-Page/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a>   |
| <a href="https://arxiv.org/abs/2603.07700"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:TDM-R1&color=red&logo=arxiv"></a>   |
| </div> |
|
|
|
|
| This is the Official Repository of "[TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward](https://arxiv.org/abs/2603.07700)", by *Yihong Luo, Tianyang Hu, Weijian Luo, Jing Tang*. |
|
|
| <div align="center"> |
| <img src="teaser_git.png" width="100%"> |
| </div> |
|
|
| <p align="center"> |
| Samples generated by <b>TDM-R1</b> using only <b>4 NFEs</b>, obtained by reinforcing the recent powerful Z-Image model. |
| </p> |
|
|
|
|
| ## Pre-trained Model |
|
|
| - [TDM-R1-ZImage](https://huggingface.co/Luo-Yihong/TDM-R1) |
|
|
| ## Usage |
|
|
| ```python |
| import os |
| os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" |
| import torch |
| from diffusers import ZImagePipeline |
| from peft import LoraConfig, get_peft_model |
| def load_ema(pipeline, lora_path, adapter_name='default'): |
| """Load EMA weights into the pipeline's transformer adapter""" |
| pipeline.transformer.set_adapter(adapter_name) |
| trainable_params = [ |
| p for n, p in pipeline.transformer.named_parameters() |
| if adapter_name in n and p.requires_grad |
| ] |
| state_dict = torch.load(lora_path, map_location=pipeline.transformer.device) |
| ema_params = state_dict["ema_parameters"] |
| assert len(trainable_params) == len(ema_params), \ |
| f"Parameter count mismatch: {len(trainable_params)} vs {len(ema_params)}" |
| for param, ema_param in zip(trainable_params, ema_params): |
| param.data.copy_(ema_param.to(param.device)) |
| print(f"Loaded EMA weights for adapter '{adapter_name}' from {lora_path}") |
| pipeline = ZImagePipeline.from_pretrained( |
| "Tongyi-MAI/Z-Image-Turbo", |
| torch_dtype=torch.bfloat16, |
| low_cpu_mem_usage=False, |
| ) |
| transformer_lora_config = LoraConfig( |
| r=32, |
| lora_alpha=64, |
| init_lora_weights="gaussian", |
| target_modules=["to_q", "to_k", "to_v", "to_out.0", "add_k_proj", "add_v_proj"], |
| ) |
| pipeline.transformer = get_peft_model( |
| pipeline.transformer, |
| transformer_lora_config, |
| adapter_name="tdmr1", |
| ) |
| load_ema( |
| pipeline, |
| lora_path="./tdmr1_zimage_ema.ckpt", |
| adapter_name="tdmr1", |
| ) |
| pipeline = pipeline.to("cuda") |
| image = pipeline( |
| prompt=prompt, |
| height=1024, |
| width=1024, |
| num_inference_steps=5, # This actually results in 4 DiT forwards |
| guidance_scale=0.0, |
| generator=torch.Generator("cuda").manual_seed(xxx), |
| ).images[0] |
| image |
| ``` |
|
|
| ## Contact |
|
|
| Please contact Yihong Luo (yluocg@connect.ust.hk) if you have any questions about this work. |
|
|
| ## Bibtex |
| ``` |
| @misc{luo2025tdmr1, |
| title={TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward}, |
| author={Yihong Luo and Tianyang Hu and Weijian Luo and Jing Tang}, |
| year={2025}, |
| eprint={TODO}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV} |
| } |
| ``` |
|
|
|
|
|
|