Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,65 @@
|
|
| 1 |
---
|
| 2 |
license: other
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: other
|
| 3 |
+
pipeline_tag: image-to-image
|
| 4 |
---
|
| 5 |
+
# StableSR Model Card
|
| 6 |
+
This model card focuses on the models associated with the StableSR, available [here](https://github.com/IceClear/StableSR).
|
| 7 |
+
|
| 8 |
+
## Model Details
|
| 9 |
+
- **Developed by:** Jianyi Wang
|
| 10 |
+
- **Model type:** Diffusion-based image super-resolution model
|
| 11 |
+
- **Language(s):** English
|
| 12 |
+
- **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
|
| 13 |
+
- **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015).
|
| 14 |
+
- **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR).
|
| 15 |
+
- **Cite as:**
|
| 16 |
+
|
| 17 |
+
@InProceedings{wang2023exploiting,
|
| 18 |
+
author = {Wang, Jianyi and Yue, Zongsheng and Zhou, Shangchen and Chan, Kelvin CK and Loy, Chen Change},
|
| 19 |
+
title = {Exploiting Diffusion Prior for Real-World Image Super-Resolution},
|
| 20 |
+
booktitle = {arXiv preprint arXiv:2305.07015},
|
| 21 |
+
year = {2023},
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
# Uses
|
| 25 |
+
Please refer to [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
|
| 26 |
+
|
| 27 |
+
## Limitations and Bias
|
| 28 |
+
|
| 29 |
+
### Limitations
|
| 30 |
+
|
| 31 |
+
- TBD
|
| 32 |
+
|
| 33 |
+
### Bias
|
| 34 |
+
While our model is based on a pre-trained Stable Diffusion model, currently we do not observe obvious bias in generated results.
|
| 35 |
+
We conjecture the main reason is that our model does not rely on text prompts but on low-resolution images.
|
| 36 |
+
Such strong conditions make our model less likely to be affected.
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
## Training
|
| 40 |
+
|
| 41 |
+
**Training Data**
|
| 42 |
+
The model developers used the following dataset for training the model:
|
| 43 |
+
|
| 44 |
+
- Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md).
|
| 45 |
+
- We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module.
|
| 46 |
+
|
| 47 |
+
**Training Procedure**
|
| 48 |
+
StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
|
| 49 |
+
|
| 50 |
+
- Following Stable Diffusion, images are encoded through the fixed VQGAN encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
|
| 51 |
+
- The latent representations are fed to the time-aware encoder as guidance.
|
| 52 |
+
- The loss is the same as Stable Diffusion.
|
| 53 |
+
- After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
|
| 54 |
+
- The VQGAN model is fixed and only CFW is trainable.
|
| 55 |
+
- The loss is similar to training a VQGAN except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
|
| 56 |
+
|
| 57 |
+
We currently provide the following checkpoints, for various versions:
|
| 58 |
+
|
| 59 |
+
- `stablesr_000117.ckpt`: Diffusion model finetuned on DF2K_OST dataset for 117 epochs.
|
| 60 |
+
- `vqgan_cfw_00011.ckpt`: CFW module with fixed VQGAN trained on synthetic paired data for 11 epochs.
|
| 61 |
+
|
| 62 |
+
## Evaluation Results
|
| 63 |
+
See [Paper](https://arxiv.org/abs/2305.07015) for details.
|
| 64 |
+
|
| 65 |
+
|