Iceclear
/

StableSR

Image-to-Image

Model card Files Files and versions

xet

Community

Iceclear commited on May 12, 2023

Commit

a52f4c1

1 Parent(s): 25c8cd7

Update README.md

Browse files

Files changed (1) hide show

README.md +62 -0

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
 ---
 license: other
 ---

 ---
 license: other
+pipeline_tag: image-to-image
 ---
+# StableSR Model Card
+This model card focuses on the models associated with the StableSR, available [here](https://github.com/IceClear/StableSR).
+## Model Details
+- **Developed by:** Jianyi Wang
+- **Model type:** Diffusion-based image super-resolution model
+- **Language(s):** English
+- **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
+- **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015).
+- **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR).
+- **Cite as:**
+      @InProceedings{wang2023exploiting,
+          author    = {Wang, Jianyi and Yue, Zongsheng and Zhou, Shangchen and Chan, Kelvin CK and Loy, Chen Change},
+          title     = {Exploiting Diffusion Prior for Real-World Image Super-Resolution},
+          booktitle = {arXiv preprint arXiv:2305.07015},
+          year      = {2023},
+      }
+# Uses
+Please refer to [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
+## Limitations and Bias
+### Limitations
+- TBD
+### Bias
+While our model is based on a pre-trained Stable Diffusion model, currently we do not observe obvious bias in generated results.
+We conjecture the main reason is that our model does not rely on text prompts but on low-resolution images.
+Such strong conditions make our model less likely to be affected.
+## Training
+**Training Data**
+The model developers used the following dataset for training the model:
+- Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md).
+- We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module.
+**Training Procedure**
+StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
+- Following Stable Diffusion, images are encoded through the fixed VQGAN encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
+- The latent representations are fed to the time-aware encoder as guidance.
+- The loss is the same as Stable Diffusion.
+- After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
+- The VQGAN model is fixed and only CFW is trainable.
+- The loss is similar to training a VQGAN except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
+We currently provide the following checkpoints, for various versions:
+- `stablesr_000117.ckpt`: Diffusion model finetuned on DF2K_OST dataset for 117 epochs.
+- `vqgan_cfw_00011.ckpt`: CFW module with fixed VQGAN trained on synthetic paired data for 11 epochs.
+## Evaluation Results
+See [Paper](https://arxiv.org/abs/2305.07015) for details.