Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,52 @@
|
|
| 1 |
---
|
| 2 |
license: creativeml-openrail-m
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: creativeml-openrail-m
|
| 3 |
+
library_name: diffusers
|
| 4 |
+
pipeline_tag: text-to-image
|
| 5 |
---
|
| 6 |
+
# SD 1.5 Big G (alpha)
|
| 7 |
+
|
| 8 |
+
This is a Stable Diffusion 1.5 model, but it uses the [CLIP Big G](https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k) text encoder instead of the original [CLIP-L](https://huggingface.co/openai/clip-vit-large-patch14) text encoder.
|
| 9 |
+
This is just a knowledge transfer pre-train with the goal of preserving the current knowledge of the model.
|
| 10 |
+
It was only trained using student/teacher training from my [SD 1.5 fine tune, Objective Reality v2](https://huggingface.co/ostris/objective-reality).
|
| 11 |
+
To fully realize the full potential of the much larger text encoder, it would need to be further fine tuned on a large dataset.
|
| 12 |
+
|
| 13 |
+
# Examples
|
| 14 |
+
|
| 15 |
+
Coming soon
|
| 16 |
+
|
| 17 |
+
# Usage
|
| 18 |
+
|
| 19 |
+
For diffusers, you can use it like any other stable diffusion model.
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
from diffusers import StableDiffusionPipeline
|
| 23 |
+
import torch
|
| 24 |
+
|
| 25 |
+
model_id = "ostris/sd15-big-g-alpha"
|
| 26 |
+
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
|
| 27 |
+
pipe = pipe.to("cuda")
|
| 28 |
+
|
| 29 |
+
prompt = "a photo of an astronaut riding a horse on mars"
|
| 30 |
+
image = pipe(prompt).images[0]
|
| 31 |
+
|
| 32 |
+
image.save("astronaut_rides_horse.png")
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
It will not work out of the box with Comfy UI or Auto1111. There would need to be special code to load it. If there is any interest in this model, I may work on compatibility.
|
| 36 |
+
Overall, it won't be hard to add. The only architecture change is the text encoder the and cross attention weights.
|
| 37 |
+
|
| 38 |
+
# Alpha
|
| 39 |
+
|
| 40 |
+
This is just a pretrained alpha. There are some concepts that did not seem to transfer. It really needs proper training on a large dataset. Anyone is welcome to take this task on. I do not plan to at the time.
|
| 41 |
+
|
| 42 |
+
# Why make this?
|
| 43 |
+
|
| 44 |
+
In the words of George Mallory, "Because it's there"
|
| 45 |
+
|
| 46 |
+
# Training Method
|
| 47 |
+
|
| 48 |
+
As mentioned above, it was trained using student/teacher only. This was an iterative process over the corse of a few months, and I did not keep track of all of the exact numbers. The following are best estimates.
|
| 49 |
+
|
| 50 |
+
The cross attention layers were trained for 1-2 million steps with a batch size of 8 on a single 4090 GPU. Then the full unet was trained for around 100k steps with the same settings.
|
| 51 |
+
|
| 52 |
+
|