arxiv:2605.12964

Asymmetric Flow Models

Published on May 13

· Submitted by

Hansheng Chen on May 14

Stanford University

Upvote

Authors:

Hansheng Chen ,

Jan Ackermann ,

Abstract

Asymmetric Flow Modeling enables efficient high-dimensional flow-based generation by restricting noise prediction to low-rank subspaces while maintaining full-dimensional data prediction, achieving superior performance in pixel-space text-to-image generation through effective fine-tuning from latent models.

AI-generated summary

Flow-based generation in high-dimensional spaces is difficult because velocity prediction requires modeling high-dimensional noise, even when data has strong low-rank structure. We present Asymmetric Flow Modeling (AsymFlow), a rank-asymmetric velocity parameterization that restricts noise prediction to a low-rank subspace while keeping data prediction full-dimensional. From this asymmetric prediction, AsymFlow analytically recovers the full-dimensional velocity without changing the network architecture or training/sampling procedures. On ImageNet 256times256, AsymFlow achieves a leading 1.57 FID, outperforming prior DiT/JiT-like pixel diffusion models by a large margin. AsymFlow also provides the first-ever route for finetuning pretrained latent flow models into pixel-space models: aligning the low-rank pixel subspace to the latent space gives a seamless initialization that preserves the latent model's high-level semantics and structure, so finetuning mainly improves low-level mismatches rather than relearning pixel generation. We show that the pixel AsymFlow model finetuned from FLUX.2 klein 9B establishes a new state of the art for pixel-space text-to-image generation, beating its latent base on HPSv3, DPG-Bench, and GenEval while qualitatively showing substantially improved visual realism.

View arXiv page View PDF Project page GitHub 303 Add to collection

Community

Lakonik

Paper author Paper submitter about 13 hours ago

JiT x0-prediction is not enough for pixel generation. AsymFlow introduces rank-asymmetric flow parameterization for scalable pixel generation.

Core Method
Velocity prediction has a data term and a noise term. AsymFlow makes them rank-asymmetric:

Data term is full-dimensional
Noise term is in a low-rank subspace

The full-dimensional velocity is recovered analytically for flow matching training and sampling.

State-of-the-Art Results

1.57 FID on ImageNet (best pixel flow model)
Finetunes FLUX.2 klein into pixel space, beats the original latent model on HPSv3/DPG/GenEval (#1 overall on HPSv3)