File size: 5,044 Bytes
eec2664 d2aced5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | ---
license: apache-2.0
---
# 🚀 SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
[](https://arxiv.org/abs/2506.00523)
[](https://github.com/XingtongGe/SenseFlow)
[](https://huggingface.co/domiso/SenseFlow)
<!-- [🤗 HuggingFace Model](https://huggingface.co/domiso/SenseFlow) -->
[Xingtong Ge](https://xingtongge.github.io/)<sup>1,2</sup>, Xin Zhang<sup>2</sup>, [Tongda Xu](https://tongdaxu.github.io/)<sup>3</sup>, [Yi Zhang](https://zhangyi-3.github.io/)<sup>4</sup>, [Xinjie Zhang](https://xinjie-q.github.io/)<sup>1</sup>, [Yan Wang](https://yanwang202199.github.io/)<sup>3</sup>, [Jun Zhang](https://eejzhang.people.ust.hk/)<sup>1</sup>
<sup>1</sup>HKUST, <sup>2</sup>SenseTime Research, <sup>3</sup>Tsinghua University, <sup>4</sup>CUHK MMLab
## Abstract
The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to constrain the divergence between the generator and the fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep denoising importance from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Together with a scaled VFM-based discriminator, our final model, dubbed **SenseFlow**, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX.1 dev.
## SenseFlow-FLUX.1 dev (supports 4–8-step generation)
* `SenseFlow-FLUX/diffusion_pytorch_model.safetensors`: the DiT checkpoint.
* `SenseFlow-FLUX/config.json`: the config of DiT using in our model.
### Usage
1. prepare the base checkpoint of FLUX.1 dev to `Path/to/FLUX`
2. Use `SenseFlow-FLUX` to replace the transformer folder `Path/to/FLUX/transformer`, obtaining the `Path/to/SenseFlow-FLUX`.
#### Using the Euler sampler
```python
import torch
from diffusers import FluxPipeline
from diffusers import FlowMatchEulerDiscreteScheduler
pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
images = pipe(
prompt,
height=1024,
width=1024,
num_inference_steps=4,
max_sequence_length=512,
).images[0]
images.save("output.png")
```
#### Using the x0 sampler (similar to the LCMScheduler in diffusers)
```python
import torch
from diffusers import FluxPipeline
from diffusers import FlowMatchEulerDiscreteScheduler
from typing import Union, Tuple, Optional
class FlowMatchEulerX0Scheduler(FlowMatchEulerDiscreteScheduler):
def step(
self,
model_output: torch.FloatTensor,
timestep: Union[float, torch.FloatTensor],
sample: torch.FloatTensor,
generator: Optional[torch.Generator] = None,
return_dict: bool = True,
) -> Union[FlowMatchEulerDiscreteSchedulerOutput, Tuple]:
if self.step_index is None:
self._init_step_index(timestep)
sample = sample.to(torch.float32) # Ensure precision
sigma = self.sigmas[self.step_index]
sigma_next = self.sigmas[self.step_index + 1]
# 1. Compute x0 from model output (assuming model predicts noise)
x0 = sample - sigma * model_output
# 2. Add noise to x0 to get the sample for the next step
noise = torch.randn_like(sample)
prev_sample = (1 - sigma_next) * x0 + sigma_next * noise
prev_sample = prev_sample.to(model_output.dtype) # Convert back to original dtype
self._step_index += 1 # Move to next step
if not return_dict:
return (prev_sample,)
return FlowMatchEulerDiscreteSchedulerOutput(prev_sample=prev_sample)
pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
pipe.scheduler = FlowMatchEulerX0Scheduler.from_config(pipe.scheduler.config)
prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
images = pipe(
prompt,
height=1024,
width=1024,
num_inference_steps=4,
max_sequence_length=512,
).images[0]
images.save("output.png")
```
## DanceGRPO-SenseFlow (supports 4–8-step generation)
comming soon! |