File size: 3,091 Bytes
3b7d8ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: apache-2.0
library_name: diffusers
tags:
  - lora
  - diffusion
  - foveated-rendering
  - text-to-image
  - text-to-video
---

# Foveated Diffusion

LoRA weights for [**Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation**](https://bchao1.github.io/foveated-diffusion/). Foveated Diffision is a biologically-inspired diffusion framework that employs spatially adaptive tokenization to concentrate compute on selected regions, achieving up to 4Γ— speedups in image and video synthesis.

- Project page: https://bchao1.github.io/foveated-diffusion/
- Paper: https://arxiv.org/abs/2603.23491

## Repository structure

```
foveated_diffusion/
β”œβ”€β”€ image/
β”‚   β”œβ”€β”€ no_fov.safetensors        # finetuned baseline, no foveation conditioning
β”‚   β”œβ”€β”€ fov_random.safetensors    # foveation conditioning at random gaze locations
β”‚   β”œβ”€β”€ fov_saliency.safetensors  # foveation conditioning driven by saliency
β”‚   └── fov_bbox.safetensors      # foveation conditioning driven by bounding boxes
└── video/                        # (coming soon)
```

All image checkpoints are rank-32 LoRA adapters saved as `safetensors`.

## Usage

The image LoRAs are trained on top of `black-forest-labs/FLUX.2-klein-base-4B` and are loaded into the foveated FLUX.2 pipeline that ships with the [project codebase](https://bchao1.github.io/foveated-diffusion/) (built on [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)).

```python
import torch
from huggingface_hub import hf_hub_download
from diffsynth.pipelines.flux2_image import ModelConfig
from src.diffsynth_fov import Flux2FoveatedImagePipeline

MODEL_ID = "black-forest-labs/FLUX.2-klein-base-4B"

pipe = Flux2FoveatedImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id=MODEL_ID, origin_file_pattern="transformer/*.safetensors"),
        ModelConfig(model_id=MODEL_ID, origin_file_pattern="text_encoder/*.safetensors"),
        ModelConfig(model_id=MODEL_ID, origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id=MODEL_ID, origin_file_pattern="tokenizer/"),
)

lora_path = hf_hub_download(
    repo_id="bchao1/foveated_diffusion",
    filename="image/fov_saliency.safetensors",
)
pipe.load_lora(pipe.dit, lora_path)
```

Or run the project's `inference.py` directly:

```bash
python inference.py \
    --experiment ours \
    --lora_checkpoint /path/to/fov_saliency.safetensors
```

See the [project page](https://bchao1.github.io/foveated-diffusion/) for the full inference pipeline (gaze handling, foveation transform, decode modes, etc.).

## Citation

```bibtex
@misc{chao2026foveateddiffusion,
      title={Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation}, 
      author={Brian Chao and Lior Yariv and Howard Xiao and Gordon Wetzstein},
      year={2026},
      eprint={2603.23491},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.23491}, 
}
```