nightknocker's picture
Update README.md
4a72c1b verified
|
raw
history blame
679 Bytes
metadata
license: apache-2.0

UNet

A lightweight UNet with single-block levels and sliding window attention.

  • Pixel-space model in CIELAB color space
  • LAB input, RGB output
  • Decompose the input images into their frequency-domain components
  • Docling as text encoder
  • Token efficient visual text inputs
  • Variable head in the attention modules across the layers

Retrospection

Reconstruction quality, from good to worst:

  • U-Docling (this repo)
  • U-DAE
  • U-DAE-NLL
  • EQ-SAE-CIELAB
  • EQ-SAE-CIELAB-c8
  • VAE-f16-c4-kv
  • VAE-f16-c4
  • VAE-f16-c8

References

  • 2411.17459
  • 2503.11576
  • 2510.17800
  • 2510.18279