| | --- |
| | license: other |
| | pipeline_tag: image-to-image |
| | --- |
| | # StableSR Model Card |
| | This model card focuses on the models associated with the StableSR, available [here](https://github.com/IceClear/StableSR). |
| |
|
| | ## Model Details |
| | - **Developed by:** Jianyi Wang |
| | - **Model type:** Diffusion-based image super-resolution model |
| | - **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt) |
| | - **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015). |
| | - **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR). |
| | - **Cite as:** |
| |
|
| | @InProceedings{wang2023exploiting, |
| | author = {Wang, Jianyi and Yue, Zongsheng and Zhou, Shangchen and Chan, Kelvin CK and Loy, Chen Change}, |
| | title = {Exploiting Diffusion Prior for Real-World Image Super-Resolution}, |
| | booktitle = {arXiv preprint arXiv:2305.07015}, |
| | year = {2023}, |
| | } |
| | |
| | # Uses |
| | Please refer to [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt) |
| |
|
| | ## Limitations and Bias |
| |
|
| | ### Limitations |
| |
|
| | - StableSR still requires multiple steps for generating an image, which is much slower than GAN-based approaches, especially for large images beyond 512 or 768. |
| | - StableSR sometimes cannot keep 100% fidelity due to its generative nature. |
| | - StableSR sometimes cannot generate perfect details under complex real-world scenarios. |
| |
|
| | ### Bias |
| | While our model is based on a pre-trained Stable Diffusion model, currently we do not observe obvious bias in generated results. |
| | We conjecture the main reason is that our model does not rely on text prompts but on low-resolution images. |
| | Such strong conditions make our model less likely to be affected. |
| |
|
| |
|
| | ## Training |
| |
|
| | **Training Data** |
| | The model developer used the following dataset for training the model: |
| |
|
| | - Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md). |
| | - We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module. |
| | |
| | **Training Procedure** |
| | StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module. |
| | |
| | - Following Stable Diffusion, images are encoded through the fixed autoencoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4. |
| | - The latent representations are fed to the time-aware encoder as guidance. |
| | - The loss is the same as Stable Diffusion. |
| | - After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model. |
| | - The autoencoder model is fixed and only CFW is trainable. |
| | - The loss is similar to training an autoencoder, except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one. |
| | |
| | We currently provide the following checkpoints: |
| | |
| | - [stablesr_000117.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/stablesr_000117.ckpt): Diffusion model finetuned on [SD2.1-512base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) with DF2K_OST dataset for 117 epochs. |
| | - [vqgan_cfw_00011.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/vqgan_cfw_00011.ckpt): CFW module with fixed autoencoder trained on synthetic paired data for 11 epochs. |
| | - [stablesr_768v_000139.ckpt](https://huggingface.co/Iceclear/StableSR/blob/main/stablesr_768v_000139.ckpt): Diffusion model finetuned on [SD2.1-768v](https://huggingface.co/stabilityai/stable-diffusion-2-1) with DF2K_OST dataset for 139 epochs. |
| | |
| | ## Evaluation Results |
| | See [Paper](https://arxiv.org/abs/2305.07015) for details. |