Velocity Direction Caution

by JStyborski - opened Jun 12, 2025

Jun 12, 2025

I think the velocity convention for this model is opposite of other models. I caution other users to be careful when designing programs that use the rectified flow model here, as its velocity prediction is reversed.

For regular diffusion models, signal-to-noise ratio decreases as timestep increases (i.e., at t=0, x_t is the input image, at t=T, x_t is noise)
Rectified flow models tend to reverse this trend, by setting x_t = t * x_0 + (1-t) * noise which has increasing signal-to-noise ratio as t increases. The corresponding velocity is v_t = x_0 - noise

The issue is that the InstaFlow paper seems to maintain the timestep trend from diffusion models but with the rectified flow velocity. The rectified flow model here is trained to predict x_0 - noise. However, the sampling/denoising method (in pipeline_rf.py) uses decreasing timesteps, decreasing from noise at t=1000 to image at t=0. To remedy this, they set step size (dt) as a positive constant, such that the Euler step is always in the direction of velocity (from noise to image). This is the opposite behavior of something like StableDiffusion3.5, which uses a target of (noise - x_0) during training and a negative step size during sampling.

In summary, the rectified flow model here is inconsistent with its timestepping, which requires reversing the step direction. Please be aware of this if you decide to implement this model in frameworks other than their custom scripts in the InstaFlow github.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment