Method

The training method of this distillation model follows Alibaba's Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield, which decouples the DMD Loss into a CFG Augmentation Loss under two independent time steps and a new Distribution Matching Loss as regularization.

Experiments were conducted based on DMD2 Codebase.

It also uses the backward simulation specially designed in the DMD2 on SDv1.5 (the original code was only for SDXL).

Training with the following parameters:

--real_guidance_scale 3.0 \
--fake_guidance_scale 1.0 \
--max_grad_norm 10.0 \
--use_fp16 \
--log_loss \
--dfake_gen_update_ratio 5 \
--fsdp \
--denoising \
--num_denoising_step 4 \
--denoising_timestep 1000 \
--backward_simulation \
--use_decoupled_dmd \
--min_step_percent 0.0 \
--max_step_percent 1.0 \