Masked model
U-shaped transformer model in CIELAB color space. The model reconstructs the input image.
- LAB input, RGB output
- 8 channel latent
As in the image restoration model, only a random subset of the image patches are taught to the model, which shortens the learning time.
The upsample layers generate images (at different resolution):
- heatmap from labels (as in CLIP retrieval)
- RGB image
- optional, edge detection
It prioritizes color accuracy.
Datasets
- Pixiv_1024