---
license: cc-by-4.0
language:
- en
base_model:
- ByteDance/sd2.1-base-zsnr-laionaes5
pipeline_tag: image-text-to-image
tags:
- SPAD
- Photons
- Generative
- ISP
datasets:
- aRy4n/eXtreme-Deformable
- aRy4n/real-color-SPAD-indoor6
metrics:
- type: accuracy
  split: test
  task:
    type: video-to-image
    name: Burst Reconstruction
---

## gQIR: Generative Quanta Image Reconstruction

[Aryan Garg](https://aryan-garg.github.io/)<sup>1</sup>, [Sizhuo Ma](https://sizhuoma.netlify.app/)<sup>2</sup>, [Mohit Gupta](https://wisionlab.com/people/mohit-gupta/)<sup>1</sup>

<sup>1</sup> University of Wisconsin-Madison <sup>2</sup> Snap, Inc<br>

![color_spads](assets/README_teaser_color_SPAD.png)


## All model weights are available here now! 

| Color-Model Name | Stage | Bit Depth | 🤗 Download Link |
|:---|:---:|:---:|:---|
| qVAE | Stage 1 | 1-bit | [1965000.pt](https://huggingface.co/aRy4n/gQIR/resolve/main/1-bit/1965000.pt) |
| Adversarial Diffusion LoRA-UNet  | Stage 2 | 1-bit | [state_dict.pth](https://huggingface.co/aRy4n/gQIR/resolve/main/1-bit/state_dict.pth) |
| qVAE | Stage 1 | 3-bit | [0105000.pt](https://huggingface.co/aRy4n/gQIR/resolve/main/0105000.pt) |
| Adversarial Diffusion LoRA-UNet | Stage 2 |  3-bit | [state_dict.pth](https://huggingface.co/aRy4n/gQIR/resolve/main/state_dict.pth) |
| FusionViT | Stage 3 |  3-bit | [fusion_vit_0050000.pt](https://huggingface.co/aRy4n/gQIR/resolve/main/fusion_vit_0050000.pt) |

Code at [github.com/Aryan-Garg/gQIR](https://github.com/Aryan-Garg/gQIR) 

ArXiv Version: [arxiv.org/abs/2602.20417](https://arxiv.org/abs/2602.20417)

#### Cite Us:
```bibtex
@InProceedings{garg_2026_gqir,
    author    = {Garg, Aryan and Ma, Sizhuo and  Gupta, Mohit},
    title     = {gQIR: Generative Quanta Image Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
}
```