Add model files
Browse files- LICENSE +14 -0
- README.md +59 -0
- v1-1-to-v1-5.png +0 -0
- v1-variants-scores.jpg +0 -0
LICENSE
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
All rights reserved by the authors.
|
| 2 |
+
You must not distribute the weights provided to you directly or indirectly without explicit consent of the authors.
|
| 3 |
+
You must not distribute harmful, offensive, dehumanizing content or otherwise harmful representations of people or their environments, cultures, religions, etc. produced with the model weights
|
| 4 |
+
or other generated content described in the "Misuse and Malicious Use" section in the model card.
|
| 5 |
+
The model weights are provided for research purposes only.
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
| 9 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
| 10 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
| 11 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
| 12 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
| 13 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
| 14 |
+
SOFTWARE.
|
README.md
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: creativeml-openrail-m
|
| 3 |
+
tags:
|
| 4 |
+
- stable-diffusion
|
| 5 |
+
- text-to-image
|
| 6 |
+
inference: false
|
| 7 |
+
---
|
| 8 |
+
# Stable Diffusion
|
| 9 |
+
|
| 10 |
+
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
|
| 11 |
+
This model card gives an overview of all available model checkpoints. For more in-detail model cards, please have a look at the model repositories listed under [Model Access](#model-access).
|
| 12 |
+
|
| 13 |
+
## Stable Diffusion Version 1
|
| 14 |
+
|
| 15 |
+
For the first version 4 model checkpoints are released.
|
| 16 |
+
*Higher* versions have been trained for longer and are thus usually better in terms of image generation quality then *lower* versions. More specifically:
|
| 17 |
+
|
| 18 |
+
- **stable-diffusion-v1-1**: The checkpoint is randomly initialized and has been trained on 237,000 steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
|
| 19 |
+
194,000 steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
|
| 20 |
+
- **stable-diffusion-v1-2**: The checkpoint resumed training from `stable-diffusion-v1-1`.
|
| 21 |
+
515,000 steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
|
| 22 |
+
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
|
| 23 |
+
- **stable-diffusion-v1-3**: The checkpoint resumed training from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
|
| 24 |
+
- **stable-diffusion-v1-4**: The checkpoint resumed training from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
|
| 25 |
+
- [**`stable-diffusion-v1-4`**](https://huggingface.co/CompVis/stable-diffusion-v1-4) Resumed from `stable-diffusion-v1-2`.225,000 steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
|
| 26 |
+
|
| 27 |
+
### Model Access
|
| 28 |
+
|
| 29 |
+
Each checkpoint can be used both with Hugging Face's [ 🧨 Diffusers library](https://github.com/huggingface/diffusers) or the original [Stable Diffusion GitHub repository](https://github.com/CompVis/stable-diffusion). Note that you have to *"click-request"* them on each respective model repository.
|
| 30 |
+
|
| 31 |
+
| **[🤗's 🧨 Diffusers library](https://github.com/huggingface/diffusers)** | **[Stable Diffusion GitHub repository](https://github.com/CompVis/stable-diffusion)** |
|
| 32 |
+
| ----------- | ----------- |
|
| 33 |
+
| [`stable-diffusion-v1-1`](https://huggingface.co/CompVis/stable-diffusion-v1-1) | [`stable-diffusion-v-1-1-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-1-original) |
|
| 34 |
+
| [`stable-diffusion-v1-2`](https://huggingface.co/CompVis/stable-diffusion-v1-2) | [`stable-diffusion-v-1-2-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-2-original) |
|
| 35 |
+
| [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3) | [`stable-diffusion-v-1-3-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original) |
|
| 36 |
+
| [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) | [`stable-diffusion-v-1-4-original`](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) |
|
| 37 |
+
|
| 38 |
+
### Demo
|
| 39 |
+
|
| 40 |
+
To quickly try out the model, you can try out the [Stable Diffusion Space](https://huggingface.co/spaces/stabilityai/stable-diffusion).
|
| 41 |
+
|
| 42 |
+
### License
|
| 43 |
+
|
| 44 |
+
[The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
|
| 45 |
+
|
| 46 |
+
## Citation
|
| 47 |
+
|
| 48 |
+
```bibtex
|
| 49 |
+
@InProceedings{Rombach_2022_CVPR,
|
| 50 |
+
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
|
| 51 |
+
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
|
| 52 |
+
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
|
| 53 |
+
month = {June},
|
| 54 |
+
year = {2022},
|
| 55 |
+
pages = {10684-10695}
|
| 56 |
+
}
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
*This model card was written by: Robin Rombach and Patrick Esser and is based on the [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).*
|
v1-1-to-v1-5.png
ADDED
|
v1-variants-scores.jpg
ADDED
|