Diffusers
Safetensors
RectifiedFlowPipeline

Velocity Direction Caution

#3
by JStyborski - opened

I think the velocity convention for this model is opposite of other models. I caution other users to be careful when designing programs that use the rectified flow model here, as its velocity prediction is reversed.

  • For regular diffusion models, signal-to-noise ratio decreases as timestep increases (i.e., at t=0, x_t is the input image, at t=T, x_t is noise)
  • Rectified flow models tend to reverse this trend, by setting x_t = t * x_0 + (1-t) * noise which has increasing signal-to-noise ratio as t increases. The corresponding velocity is v_t = x_0 - noise

The issue is that the InstaFlow paper seems to maintain the timestep trend from diffusion models but with the rectified flow velocity. The rectified flow model here is trained to predict x_0 - noise. However, the sampling/denoising method (in pipeline_rf.py) uses decreasing timesteps, decreasing from noise at t=1000 to image at t=0. To remedy this, they set step size (dt) as a positive constant, such that the Euler step is always in the direction of velocity (from noise to image). This is the opposite behavior of something like StableDiffusion3.5, which uses a target of (noise - x_0) during training and a negative step size during sampling.

In summary, the rectified flow model here is inconsistent with its timestepping, which requires reversing the step direction. Please be aware of this if you decide to implement this model in frameworks other than their custom scripts in the InstaFlow github.

Sign up or log in to comment