--- library_name: diffusers pipeline_tag: text-to-image --- # TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward

This is the Official Repository of "[TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward](https://arxiv.org/abs/2603.07700)", by *Yihong Luo, Tianyang Hu, Weijian Luo, Jing Tang*.

Samples generated by TDM-R1 using only 4 NFEs, obtained by reinforcing the recent powerful Z-Image model.

## Description TDM-R1 is a reinforcement learning (RL) paradigm for few-step generative models. It decouples the learning process into surrogate reward learning and generator learning, allowing for the use of non-differentiable rewards (e.g., human preference, object counts). This repository contains the reinforced version of the [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) model. ## Pre-trained Model - [TDM-R1-ZImage](https://huggingface.co/Luo-Yihong/TDM-R1) ## Usage You can use this model with `diffusers` and `peft`. Below is an example of how to load the weights as a LoRA adapter. ```python import os os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" import torch from diffusers import ZImagePipeline from peft import LoraConfig, get_peft_model def load_ema(pipeline, lora_path, adapter_name='default'): """Load EMA weights into the pipeline's transformer adapter""" pipeline.transformer.set_adapter(adapter_name) trainable_params = [ p for n, p in pipeline.transformer.named_parameters() if adapter_name in n and p.requires_grad ] state_dict = torch.load(lora_path, map_location=pipeline.transformer.device) ema_params = state_dict["ema_parameters"] assert len(trainable_params) == len(ema_params), \ f"Parameter count mismatch: {len(trainable_params)} vs {len(ema_params)}" for param, ema_param in zip(trainable_params, ema_params): param.data.copy_(ema_param.to(param.device)) print(f"Loaded EMA weights for adapter '{adapter_name}' from {lora_path}") pipeline = ZImagePipeline.from_pretrained( "Tongyi-MAI/Z-Image-Turbo", torch_dtype=torch.bfloat16, low_cpu_mem_usage=False, ) transformer_lora_config = LoraConfig( r=32, lora_alpha=64, init_lora_weights="gaussian", target_modules=["to_q", "to_k", "to_v", "to_out.0", "add_k_proj", "add_v_proj"], ) pipeline.transformer = get_peft_model( pipeline.transformer, transformer_lora_config, adapter_name="tdmr1", ) # Ensure the checkpoint file is downloaded locally load_ema( pipeline, lora_path="./tdmr1_zimage_ema.ckpt", adapter_name="tdmr1", ) pipeline = pipeline.to("cuda") image = pipeline( prompt="A high quality photo of a cat", height=1024, width=1024, num_inference_steps=5, # This actually results in 4 DiT forwards guidance_scale=0.0, generator=torch.Generator("cuda").manual_seed(42), ).images[0] image ``` ## Contact Please contact Yihong Luo (yluocg@connect.ust.hk) if you have any questions about this work. ## Bibtex ```bibtex @misc{luo2025tdmr1, title={TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward}, author={Yihong Luo and Tianyang Hu and Weijian Luo and Jing Tang}, year={2025}, eprint={2603.07700}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```