Foveated Diffusion

LoRA weights for Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation. Foveated Diffision is a biologically-inspired diffusion framework that employs spatially adaptive tokenization to concentrate compute on selected regions, achieving up to 4× speedups in image and video synthesis.

Project page: https://bchao1.github.io/foveated-diffusion/
Paper: https://arxiv.org/abs/2603.23491

Repository structure

foveated_diffusion/
├── image/
│   ├── no_fov.safetensors        # finetuned baseline, no foveation conditioning
│   ├── fov_random.safetensors    # foveation conditioning at random gaze locations
│   ├── fov_saliency.safetensors  # foveation conditioning driven by saliency
│   └── fov_bbox.safetensors      # foveation conditioning driven by bounding boxes
└── video/                        # (coming soon)

All image checkpoints are rank-32 LoRA adapters saved as safetensors.

Usage

The image LoRAs are trained on top of black-forest-labs/FLUX.2-klein-base-4B and are loaded into the foveated FLUX.2 pipeline that ships with the project codebase (built on DiffSynth-Studio).

import torch
from huggingface_hub import hf_hub_download
from diffsynth.pipelines.flux2_image import ModelConfig
from src.diffsynth_fov import Flux2FoveatedImagePipeline

MODEL_ID = "black-forest-labs/FLUX.2-klein-base-4B"

pipe = Flux2FoveatedImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id=MODEL_ID, origin_file_pattern="transformer/*.safetensors"),
        ModelConfig(model_id=MODEL_ID, origin_file_pattern="text_encoder/*.safetensors"),
        ModelConfig(model_id=MODEL_ID, origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id=MODEL_ID, origin_file_pattern="tokenizer/"),
)

lora_path = hf_hub_download(
    repo_id="bchao1/foveated_diffusion",
    filename="image/fov_saliency.safetensors",
)
pipe.load_lora(pipe.dit, lora_path)

Or run the project's inference.py directly:

python inference.py \
    --experiment ours \
    --lora_checkpoint /path/to/fov_saliency.safetensors

See the project page for the full inference pipeline (gaze handling, foveation transform, decode modes, etc.).

Citation

@misc{chao2026foveateddiffusion,
      title={Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation}, 
      author={Brian Chao and Lior Yariv and Howard Xiao and Gordon Wetzstein},
      year={2026},
      eprint={2603.23491},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.23491}, 
}

Downloads last month: -

Paper for bchao1/foveated_diffusion

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Paper • 2603.23491 • Published Mar 24