Luo-Yihong
/

TDM-R1

Model card Files Files and versions

TDM-R1 / README.md

Luo-Yihong's picture

Update README.md

cc88f65 verified 4 months ago

|

History Blame Contribute Delete

3.12 kB

	# TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward
	<div align="center">
	<a href="https://luo-yihong.github.io/TDM-R1-Page/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a> &ensp;
	<a href="https://arxiv.org/abs/2603.07700"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:TDM-R1&color=red&logo=arxiv"></a> &ensp;
	</div>


	This is the Official Repository of "[TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward](https://arxiv.org/abs/2603.07700)", by Yihong Luo, Tianyang Hu, Weijian Luo, Jing Tang.

	<div align="center">
	<img src="teaser_git.png" width="100%">
	</div>

	<p align="center">
	Samples generated by <b>TDM-R1</b> using only <b>4 NFEs</b>, obtained by reinforcing the recent powerful Z-Image model.
	</p>


	## Pre-trained Model

	- [TDM-R1-ZImage](https://huggingface.co/Luo-Yihong/TDM-R1)

	## Usage

	```python
	import os
	os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
	import torch
	from diffusers import ZImagePipeline
	from peft import LoraConfig, get_peft_model
	def load_ema(pipeline, lora_path, adapter_name='default'):
	"""Load EMA weights into the pipeline's transformer adapter"""
	pipeline.transformer.set_adapter(adapter_name)
	trainable_params = [
	p for n, p in pipeline.transformer.named_parameters()
	if adapter_name in n and p.requires_grad
	]
	state_dict = torch.load(lora_path, map_location=pipeline.transformer.device)
	ema_params = state_dict["ema_parameters"]
	assert len(trainable_params) == len(ema_params), \
	f"Parameter count mismatch: {len(trainable_params)} vs {len(ema_params)}"
	for param, ema_param in zip(trainable_params, ema_params):
	param.data.copy_(ema_param.to(param.device))
	print(f"Loaded EMA weights for adapter '{adapter_name}' from {lora_path}")
	pipeline = ZImagePipeline.from_pretrained(
	"Tongyi-MAI/Z-Image-Turbo",
	torch_dtype=torch.bfloat16,
	low_cpu_mem_usage=False,
	)
	transformer_lora_config = LoraConfig(
	r=32,
	lora_alpha=64,
	init_lora_weights="gaussian",
	target_modules=["to_q", "to_k", "to_v", "to_out.0", "add_k_proj", "add_v_proj"],
	)
	pipeline.transformer = get_peft_model(
	pipeline.transformer,
	transformer_lora_config,
	adapter_name="tdmr1",
	)
	load_ema(
	pipeline,
	lora_path="./tdmr1_zimage_ema.ckpt",
	adapter_name="tdmr1",
	)
	pipeline = pipeline.to("cuda")
	image = pipeline(
	prompt=prompt,
	height=1024,
	width=1024,
	num_inference_steps=5, # This actually results in 4 DiT forwards
	guidance_scale=0.0,
	generator=torch.Generator("cuda").manual_seed(xxx),
	).images[0]
	image
	```

	## Contact

	Please contact Yihong Luo (yluocg@connect.ust.hk) if you have any questions about this work.

	## Bibtex
	```
	@misc{luo2025tdmr1,
	title={TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward},
	author={Yihong Luo and Tianyang Hu and Weijian Luo and Jing Tang},
	year={2025},
	eprint={TODO},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```