prasadsachin commited on
Commit
5365165
·
verified ·
1 Parent(s): b35dc7a

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +282 -23
README.md CHANGED
@@ -1,26 +1,285 @@
1
  ---
2
  library_name: keras-hub
3
  ---
4
- This is a [`StableDiffusion3` model](https://keras.io/api/keras_hub/models/stable_diffusion3) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
5
- Model config:
6
- * **name:** stable_diffusion_3.5_medium_backbone
7
- * **trainable:** True
8
- * **dtype:** {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'bfloat16'}, 'registered_name': None}
9
- * **mmdit_patch_size:** 2
10
- * **mmdit_hidden_dim:** 1536
11
- * **mmdit_num_layers:** 24
12
- * **mmdit_num_heads:** 24
13
- * **mmdit_position_size:** 384
14
- * **mmdit_qk_norm:** rms_norm
15
- * **mmdit_dual_attention_indices:** [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
16
- * **vae:** {'module': 'keras_hub.src.models.vae.vae_backbone', 'class_name': 'VAEBackbone', 'config': {'name': 'vae', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'bfloat16'}, 'registered_name': None}, 'encoder_num_filters': [128, 256, 512, 512], 'encoder_num_blocks': [2, 2, 2, 2], 'decoder_num_filters': [512, 512, 256, 128], 'decoder_num_blocks': [3, 3, 3, 3], 'sampler_method': 'sample', 'input_channels': 3, 'sample_channels': 32, 'output_channels': 3, 'scale': 1.5305, 'shift': 0.0609}, 'registered_name': 'VAEBackbone'}
17
- * **clip_l:** {'module': 'keras_hub.src.models.clip.clip_text_encoder', 'class_name': 'CLIPTextEncoder', 'config': {'name': 'clip_l', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float16'}, 'registered_name': None}, 'vocabulary_size': 49408, 'embedding_dim': 768, 'hidden_dim': 768, 'num_layers': 12, 'num_heads': 12, 'intermediate_dim': 3072, 'intermediate_activation': 'quick_gelu', 'intermediate_output_index': 10, 'max_sequence_length': 77}, 'registered_name': 'keras_hub>CLIPTextEncoder'}
18
- * **clip_g:** {'module': 'keras_hub.src.models.clip.clip_text_encoder', 'class_name': 'CLIPTextEncoder', 'config': {'name': 'clip_g', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float16'}, 'registered_name': None}, 'vocabulary_size': 49408, 'embedding_dim': 1280, 'hidden_dim': 1280, 'num_layers': 32, 'num_heads': 20, 'intermediate_dim': 5120, 'intermediate_activation': 'gelu', 'intermediate_output_index': 30, 'max_sequence_length': 77}, 'registered_name': 'keras_hub>CLIPTextEncoder'}
19
- * **t5:** None
20
- * **latent_channels:** 16
21
- * **output_channels:** 3
22
- * **num_train_timesteps:** 1000
23
- * **shift:** 3.0
24
- * **image_shape:** [1024, 1024, 3]
25
-
26
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: keras-hub
3
  ---
4
+ ### Model Overview
5
+ [Stable Diffusion 3.5 ](https://stability.ai/learning-hub/stable-diffusion-3-5-prompt-guide) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
6
+
7
+ For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
8
+
9
+ Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.
10
+
11
+ ## Links
12
+
13
+ * [SD3.5 Quickstart Notebook ](https://colab.sandbox.google.com/gist/laxmareddyp/55daf77f87730c3b3f498318672f70b3/stablediffusion3_5-quckstart-notebook.ipynb)
14
+ * [SD3.5 API Documentation](https://keras.io/keras_hub/api/models/stable_diffusion_3/)
15
+ * [SD3.5 Model Card](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)
16
+ * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
17
+ * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
18
+
19
+ ## Presets
20
+
21
+ The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
22
+ | Preset name | Parameters | Description |
23
+ |----------------|------------|--------------------------------------------------|
24
+ | stable_diffusion_3.5_large| 9.05B | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI.|
25
+ | stable_diffusion_3.5_large_turbo | 9.05B | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. A timestep-distilled version that eliminates classifier-free guidance and uses fewer steps for generation. Developed by Stability AI. |
26
+
27
+ ### Model Description
28
+
29
+ - **Developed by:** Stability AI
30
+ - **Model type:** MMDiT text-to-image generative model
31
+ - **Model Description:** This is a model that can be used to generate images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206)
32
+ that uses three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-xxl), and QK-normalization to improve training stability.
33
+
34
+ ## Example Usage
35
+ ```python
36
+ !pip install -U keras-hub
37
+ !pip install -U keras
38
+ ```
39
+
40
+ ```
41
+ # Pretrained Stable Diffusion 3 model.
42
+ model = keras_hub.models.StableDiffusion3Backbone.from_preset(
43
+ "stable_diffusion_3.5_medium"
44
+ )
45
+
46
+ # Randomly initialized Stable Diffusion 3 model with custom config.
47
+ vae = keras_hub.models.VAEBackbone(...)
48
+ clip_l = keras_hub.models.CLIPTextEncoder(...)
49
+ clip_g = keras_hub.models.CLIPTextEncoder(...)
50
+ model = keras_hub.models.StableDiffusion3Backbone(
51
+ mmdit_patch_size=2,
52
+ mmdit_num_heads=4,
53
+ mmdit_hidden_dim=256,
54
+ mmdit_depth=4,
55
+ mmdit_position_size=192,
56
+ vae=vae,
57
+ clip_l=clip_l,
58
+ clip_g=clip_g,
59
+ )
60
+
61
+ # Image to image example
62
+ image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
63
+ "stable_diffusion_3.5_medium", height=512, width=512
64
+ )
65
+ image_to_image.generate(
66
+ {
67
+ "images": np.ones((512, 512, 3), dtype="float32"),
68
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
69
+ }
70
+ )
71
+
72
+ # Generate with batched prompts.
73
+ image_to_image.generate(
74
+ {
75
+ "images": np.ones((2, 512, 512, 3), dtype="float32"),
76
+ "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
77
+ }
78
+ )
79
+
80
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
81
+ image_to_image.generate(
82
+ {
83
+ "images": np.ones((512, 512, 3), dtype="float32"),
84
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
85
+ }
86
+ num_steps=50,
87
+ guidance_scale=5.0,
88
+ strength=0.6,
89
+ )
90
+
91
+ # Generate with `negative_prompts`.
92
+ text_to_image.generate(
93
+ {
94
+ "images": np.ones((512, 512, 3), dtype="float32"),
95
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
96
+ "negative_prompts": "green color",
97
+ }
98
+ )
99
+
100
+ # inpainting example
101
+ reference_image = np.ones((1024, 1024, 3), dtype="float32")
102
+ reference_mask = np.ones((1024, 1024), dtype="float32")
103
+ inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
104
+ "stable_diffusion_3.5_medium", height=512, width=512
105
+ )
106
+ inpaint.generate(
107
+ reference_image,
108
+ reference_mask,
109
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
110
+ )
111
+
112
+ # Generate with batched prompts.
113
+ reference_images = np.ones((2, 512, 512, 3), dtype="float32")
114
+ reference_mask = np.ones((2, 1024, 1024), dtype="float32")
115
+ inpaint.generate(
116
+ reference_images,
117
+ reference_mask,
118
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
119
+ )
120
+
121
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
122
+ inpaint.generate(
123
+ reference_image,
124
+ reference_mask,
125
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
126
+ num_steps=50,
127
+ guidance_scale=5.0,
128
+ strength=0.6,
129
+ )
130
+
131
+ # text to image example
132
+ text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
133
+ "stable_diffusion_3.5_medium", height=512, width=512
134
+ )
135
+ text_to_image.generate(
136
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
137
+ )
138
+
139
+ # Generate with batched prompts.
140
+ text_to_image.generate(
141
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
142
+ )
143
+
144
+ # Generate with different `num_steps` and `guidance_scale`.
145
+ text_to_image.generate(
146
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
147
+ num_steps=50,
148
+ guidance_scale=5.0,
149
+ )
150
+
151
+ # Generate with `negative_prompts`.
152
+ text_to_image.generate(
153
+ {
154
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
155
+ "negative_prompts": "green color",
156
+ }
157
+ )
158
+ ```
159
+
160
+ ## Example Usage with Hugging Face URI
161
+
162
+ ```python
163
+ !pip install -U keras-hub
164
+ !pip install -U keras
165
+ ```
166
+
167
+ ```
168
+ # Pretrained Stable Diffusion 3 model.
169
+ model = keras_hub.models.StableDiffusion3Backbone.from_preset(
170
+ "hf://keras/stable_diffusion_3.5_medium"
171
+ )
172
+
173
+ # Randomly initialized Stable Diffusion 3 model with custom config.
174
+ vae = keras_hub.models.VAEBackbone(...)
175
+ clip_l = keras_hub.models.CLIPTextEncoder(...)
176
+ clip_g = keras_hub.models.CLIPTextEncoder(...)
177
+ model = keras_hub.models.StableDiffusion3Backbone(
178
+ mmdit_patch_size=2,
179
+ mmdit_num_heads=4,
180
+ mmdit_hidden_dim=256,
181
+ mmdit_depth=4,
182
+ mmdit_position_size=192,
183
+ vae=vae,
184
+ clip_l=clip_l,
185
+ clip_g=clip_g,
186
+ )
187
+
188
+ # Image to image example
189
+ image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
190
+ "hf://keras/stable_diffusion_3.5_medium", height=512, width=512
191
+ )
192
+ image_to_image.generate(
193
+ {
194
+ "images": np.ones((512, 512, 3), dtype="float32"),
195
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
196
+ }
197
+ )
198
+
199
+ # Generate with batched prompts.
200
+ image_to_image.generate(
201
+ {
202
+ "images": np.ones((2, 512, 512, 3), dtype="float32"),
203
+ "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
204
+ }
205
+ )
206
+
207
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
208
+ image_to_image.generate(
209
+ {
210
+ "images": np.ones((512, 512, 3), dtype="float32"),
211
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
212
+ }
213
+ num_steps=50,
214
+ guidance_scale=5.0,
215
+ strength=0.6,
216
+ )
217
+
218
+ # Generate with `negative_prompts`.
219
+ text_to_image.generate(
220
+ {
221
+ "images": np.ones((512, 512, 3), dtype="float32"),
222
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
223
+ "negative_prompts": "green color",
224
+ }
225
+ )
226
+
227
+ # inpainting example
228
+ reference_image = np.ones((1024, 1024, 3), dtype="float32")
229
+ reference_mask = np.ones((1024, 1024), dtype="float32")
230
+ inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
231
+ "hf://keras/stable_diffusion_3.5_medium", height=512, width=512
232
+ )
233
+ inpaint.generate(
234
+ reference_image,
235
+ reference_mask,
236
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
237
+ )
238
+
239
+ # Generate with batched prompts.
240
+ reference_images = np.ones((2, 512, 512, 3), dtype="float32")
241
+ reference_mask = np.ones((2, 1024, 1024), dtype="float32")
242
+ inpaint.generate(
243
+ reference_images,
244
+ reference_mask,
245
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
246
+ )
247
+
248
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
249
+ inpaint.generate(
250
+ reference_image,
251
+ reference_mask,
252
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
253
+ num_steps=50,
254
+ guidance_scale=5.0,
255
+ strength=0.6,
256
+ )
257
+
258
+ # text to image example
259
+ text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
260
+ "hf://keras/stable_diffusion_3.5_medium", height=512, width=512
261
+ )
262
+ text_to_image.generate(
263
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
264
+ )
265
+
266
+ # Generate with batched prompts.
267
+ text_to_image.generate(
268
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
269
+ )
270
+
271
+ # Generate with different `num_steps` and `guidance_scale`.
272
+ text_to_image.generate(
273
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
274
+ num_steps=50,
275
+ guidance_scale=5.0,
276
+ )
277
+
278
+ # Generate with `negative_prompts`.
279
+ text_to_image.generate(
280
+ {
281
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
282
+ "negative_prompts": "green color",
283
+ }
284
+ )
285
+ ```