Foveated Diffusion

LoRA weights for Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation. Foveated Diffision is a biologically-inspired diffusion framework that employs spatially adaptive tokenization to concentrate compute on selected regions, achieving up to 4Γ— speedups in image and video synthesis.

Repository structure

foveated_diffusion/
β”œβ”€β”€ image/
β”‚   β”œβ”€β”€ no_fov.safetensors        # finetuned baseline, no foveation conditioning
β”‚   β”œβ”€β”€ fov_random.safetensors    # foveation conditioning at random gaze locations
β”‚   β”œβ”€β”€ fov_saliency.safetensors  # foveation conditioning driven by saliency
β”‚   └── fov_bbox.safetensors      # foveation conditioning driven by bounding boxes
└── video/                        # (coming soon)

All image checkpoints are rank-32 LoRA adapters saved as safetensors.

Usage

The image LoRAs are trained on top of black-forest-labs/FLUX.2-klein-base-4B and are loaded into the foveated FLUX.2 pipeline that ships with the project codebase (built on DiffSynth-Studio).

import torch
from huggingface_hub import hf_hub_download
from diffsynth.pipelines.flux2_image import ModelConfig
from src.diffsynth_fov import Flux2FoveatedImagePipeline

MODEL_ID = "black-forest-labs/FLUX.2-klein-base-4B"

pipe = Flux2FoveatedImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id=MODEL_ID, origin_file_pattern="transformer/*.safetensors"),
        ModelConfig(model_id=MODEL_ID, origin_file_pattern="text_encoder/*.safetensors"),
        ModelConfig(model_id=MODEL_ID, origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id=MODEL_ID, origin_file_pattern="tokenizer/"),
)

lora_path = hf_hub_download(
    repo_id="bchao1/foveated_diffusion",
    filename="image/fov_saliency.safetensors",
)
pipe.load_lora(pipe.dit, lora_path)

Or run the project's inference.py directly:

python inference.py \
    --experiment ours \
    --lora_checkpoint /path/to/fov_saliency.safetensors

See the project page for the full inference pipeline (gaze handling, foveation transform, decode modes, etc.).

Citation

@misc{chao2026foveateddiffusion,
      title={Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation}, 
      author={Brian Chao and Lior Yariv and Howard Xiao and Gordon Wetzstein},
      year={2026},
      eprint={2603.23491},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.23491}, 
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for bchao1/foveated_diffusion