wjh commited on
Commit
8bd2ac6
·
1 Parent(s): 42983cb

megactor-sigma alpha

Browse files
weights/PixArt_XL_2_512/.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ asset/images/more-samples.png filter=lfs diff=lfs merge=lfs -text
37
+ asset/images/more-samples1.png filter=lfs diff=lfs merge=lfs -text
38
+ asset/images/teaser.png filter=lfs diff=lfs merge=lfs -text
weights/PixArt_XL_2_512/README.md ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail++
3
+ tags:
4
+ - text-to-image
5
+ - Pixart-α
6
+ ---
7
+
8
+ <p align="center">
9
+ <img src="asset/logo.png" height=120>
10
+ </p>
11
+
12
+ <div style="display:flex;justify-content: center">
13
+ <a href="https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha"><img src="https://img.shields.io/static/v1?label=Demo&message=Huggingface&color=yellow"></a> &ensp;
14
+ <a href="https://pixart-alpha.github.io/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a> &ensp;
15
+ <a href="https://arxiv.org/abs/2310.00426"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv&color=red&logo=arxiv"></a> &ensp;
16
+ <a href="https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing"><img src="https://img.shields.io/static/v1?label=Free%20Trial&message=Google%20Colab&logo=google&color=orange"></a> &ensp;
17
+ <a href="https://github.com/orgs/PixArt-alpha/discussions"><img src="https://img.shields.io/static/v1?label=Discussion&message=Github&color=green&logo=github"></a> &ensp;
18
+ </div>
19
+
20
+ # 🐱 Pixart-α Model Card
21
+ ![row01](asset/images/teaser.png)
22
+
23
+ ## Model
24
+ ![pipeline](asset/images/model.png)
25
+
26
+ [Pixart-α](https://arxiv.org/abs/2310.00426)
27
+ consists of pure transformer blocks for latent diffusion:
28
+ It can directly generate 1024px images from text prompts within a single sampling process.
29
+
30
+ Source code is available at https://github.com/PixArt-alpha/PixArt-alpha.
31
+
32
+ ### Model Description
33
+
34
+ - **Developed by:** Pixart-α
35
+ - **Model type:** Diffusion-Transformer-based text-to-image generative model
36
+ - **
37
+ License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
38
+ - **Model Description:** This is a model that can be used to generate and modify images based on text prompts.
39
+ It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained text encoders ([T5](
40
+ https://huggingface.co/DeepFloyd/t5-v1_1-xxl))
41
+ and one latent feature encoder ([VAE](https://arxiv.org/abs/2112.10752)).
42
+ - **Resources for more information:** Check out our [GitHub Repository](https://github.com/PixArt-alpha/PixArt-alpha) and the [Pixart-α report on arXiv](https://arxiv.org/abs/2310.00426).
43
+
44
+ ### Model Sources
45
+
46
+ For research purposes, we recommend our `generative-models` Github repository (https://github.com/PixArt-alpha/PixArt-alpha),
47
+ which is more suitable for both training and inference and for which most advanced diffusion sampler like [SA-Solver](https://arxiv.org/abs/2309.05019) will be added over time.
48
+ [Hugging Face](https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha) provides free Pixart-α inference.
49
+ - **Repository:** https://github.com/PixArt-alpha/PixArt-alpha
50
+ - **Demo:** https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
51
+
52
+ # 🔥🔥🔥 Why PixArt-α?
53
+ ## Training Efficiency
54
+ PixArt-α only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days), saving nearly $300,000 ($26,000 vs. $320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%.
55
+ ![Training Efficiency.](asset/images/efficiency.svg)
56
+
57
+ | Method | Type | #Params | #Images | A100 GPU days |
58
+ |-----------|------|---------|---------|---------------|
59
+ | DALL·E | Diff | 12.0B | 1.54B | |
60
+ | GLIDE | Diff | 5.0B | 5.94B | |
61
+ | LDM | Diff | 1.4B | 0.27B | |
62
+ | DALL·E 2 | Diff | 6.5B | 5.63B | 41,66 |
63
+ | SDv1.5 | Diff | 0.9B | 3.16B | 6,250 |
64
+ | GigaGAN | GAN | 0.9B | 0.98B | 4,783 |
65
+ | Imagen | Diff | 3.0B | 15.36B | 7,132 |
66
+ | RAPHAEL | Diff | 3.0B | 5.0B | 60,000 |
67
+ | PixArt-α | Diff | 0.6B | 0.025B | 675 |
68
+
69
+
70
+ ## Evaluation
71
+ ![comparison](asset/images/user-study.png)
72
+ The chart above evaluates user preference for Pixart-α over SDXL 0.9, Stable Diffusion 2, DALLE-2 and DeepFloyd.
73
+ The Pixart-α base model performs comparable or even better than the existing state-of-the-art models.
74
+
75
+
76
+
77
+ ### 🧨 Diffusers
78
+
79
+ Make sure to upgrade diffusers to >= 0.22.0:
80
+ ```
81
+ pip install -U diffusers --upgrade
82
+ ```
83
+
84
+ In addition make sure to install `transformers`, `safetensors`, `sentencepiece`, and `accelerate`:
85
+ ```
86
+ pip install transformers accelerate safetensors
87
+ ```
88
+
89
+ To just use the base model, you can run:
90
+
91
+
92
+ ```py
93
+ from diffusers import PixArtAlphaPipeline
94
+ import torch
95
+
96
+ pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16)
97
+ pipe = pipe.to("cuda")
98
+
99
+ # if using torch < 2.0
100
+ # pipe.enable_xformers_memory_efficient_attention()
101
+
102
+ prompt = "An astronaut riding a green horse"
103
+ images = pipe(prompt=prompt).images[0]
104
+ ```
105
+
106
+ When using `torch >= 2.0`, you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline:
107
+ ```py
108
+ pipe.transformer = torch.compile(pipe.transformer, mode="reduce-overhead", fullgraph=True)
109
+ ```
110
+
111
+ If you are limited by GPU VRAM, you can enable *cpu offloading* by calling `pipe.enable_model_cpu_offload`
112
+ instead of `.to("cuda")`:
113
+
114
+ ```diff
115
+ - pipe.to("cuda")
116
+ + pipe.enable_model_cpu_offload()
117
+ ```
118
+
119
+ For more information on how to use Pixart-α with `diffusers`, please have a look at [the Pixart-α Docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pixart).
120
+
121
+ ### Free Google Colab
122
+ You can use Google Colab to generate images from PixArt-α free of charge. Click [here](https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing) too try.
123
+
124
+ ## Uses
125
+
126
+ ### Direct Use
127
+
128
+ The model is intended for research purposes only. Possible research areas and tasks include
129
+
130
+ - Generation of artworks and use in design and other artistic processes.
131
+ - Applications in educational or creative tools.
132
+ - Research on generative models.
133
+ - Safe deployment of models which have the potential to generate harmful content.
134
+ - Probing and understanding the limitations and biases of generative models.
135
+
136
+ Excluded uses are described below.
137
+
138
+ ### Out-of-Scope Use
139
+
140
+ The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
141
+
142
+ ## Limitations and Bias
143
+
144
+ ### Limitations
145
+
146
+
147
+ - The model does not achieve perfect photorealism
148
+ - The model cannot render legible text
149
+ - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
150
+ - fingers, .etc in general may not be generated properly.
151
+ - The autoencoding part of the model is lossy.
152
+
153
+ ### Bias
154
+ While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
weights/PixArt_XL_2_512/easyanimate_v1_mm.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64ca41b7856971c434049c14b81adc804aadd306112ebd1f82ff9a9b5a480c91
3
+ size 4329791280
weights/PixArt_XL_2_512/model_index.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PixArtAlphaPipeline",
3
+ "_diffusers_version": "0.22.0.dev0",
4
+ "_name_or_path": "diffusers/PixArt-XL-2-512x512",
5
+ "scheduler": [
6
+ "diffusers",
7
+ "DPMSolverMultistepScheduler"
8
+ ],
9
+ "text_encoder": [
10
+ "transformers",
11
+ "T5EncoderModel"
12
+ ],
13
+ "tokenizer": [
14
+ "transformers",
15
+ "T5Tokenizer"
16
+ ],
17
+ "transformer": [
18
+ "diffusers",
19
+ "Transformer2DModel"
20
+ ],
21
+ "vae": [
22
+ "diffusers",
23
+ "AutoencoderKL"
24
+ ]
25
+ }
weights/PixArt_XL_2_512/negative_prompt_attention_mask.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2155dd6be49776f17b01728505aef1c104eb21cedc14b8e3d3165f8e0b57fcde
3
+ size 1840
weights/PixArt_XL_2_512/negative_prompt_embeds.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b15d540c1a66225bda3e54d383d884b83ab507f0c8a51a722109a0abc722d1f9
3
+ size 983896
weights/PixArt_XL_2_512/sd-vae-ft-ema/config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.4.2",
4
+ "act_fn": "silu",
5
+ "block_out_channels": [
6
+ 128,
7
+ 256,
8
+ 512,
9
+ 512
10
+ ],
11
+ "down_block_types": [
12
+ "DownEncoderBlock2D",
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D"
16
+ ],
17
+ "in_channels": 3,
18
+ "latent_channels": 4,
19
+ "layers_per_block": 2,
20
+ "norm_num_groups": 32,
21
+ "out_channels": 3,
22
+ "sample_size": 256,
23
+ "up_block_types": [
24
+ "UpDecoderBlock2D",
25
+ "UpDecoderBlock2D",
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D"
28
+ ]
29
+ }
weights/PixArt_XL_2_512/sd-vae-ft-ema/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c98ebcd7ca5cb69d47b2ae287feba0695689fbf2c8fead2fab05fd3e0c28303
3
+ size 334707217
weights/PixArt_XL_2_512/transformer/config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "Transformer2DModel",
3
+ "_diffusers_version": "0.22.0.dev0",
4
+ "activation_fn": "gelu-approximate",
5
+ "attention_bias": true,
6
+ "attention_head_dim": 72,
7
+ "attention_type": "default",
8
+ "caption_channels": 4096,
9
+ "cross_attention_dim": 1152,
10
+ "double_self_attention": false,
11
+ "dropout": 0.0,
12
+ "in_channels": 4,
13
+ "norm_elementwise_affine": false,
14
+ "norm_eps": 1e-06,
15
+ "norm_num_groups": 32,
16
+ "norm_type": "ada_norm_single",
17
+ "num_attention_heads": 16,
18
+ "num_embeds_ada_norm": 1000,
19
+ "num_layers": 28,
20
+ "num_vector_embeds": null,
21
+ "only_cross_attention": false,
22
+ "out_channels": 8,
23
+ "patch_size": 2,
24
+ "sample_size": 64,
25
+ "upcast_attention": false,
26
+ "use_linear_projection": false
27
+ }
weights/PixArt_XL_2_512/transformer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13105448f416eb9d80206c0941cc48c1df4bd962ece461960a70866f62286ace
3
+ size 2445458680
weights/dac/v1.0_vae_dac_44100_87hz_64dim.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:391f5374eb93e461ef23ddb4f4ebb4a91eb7602004dc7e6a4829011d48dd112a
3
+ size 307578440
weights/megactor-sigma.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:faae39f269ed0d61e1b07f8855d0c40d2d5008bfd7b577cb00b8eefb9d8d3355
3
+ size 4579181697
weights/sd15_empty_str_embedding.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41b4b057ac7b49ec882239bb8300fd726165c65b2eb54350db5b901be3d0e73d
3
+ size 119134
weights/swint.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b2d1c5493dbadf87a6ebdd5ade070d192aecacc52ef69e96aaa865b7846ad60
3
+ size 1387910541
weights/tiny.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3590e7c92d84ded80f3a274b95664efafd8f3ddab8d1ccc3bcb98bedd756dfc
3
+ size 4980230
weights/vae_dac.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a88eed82a7024ccc1facdb1e605c4c2f99281c8118c22c9895ffa846d8fb61aa
3
+ size 306717287
weights/whisper_tiny.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9
3
+ size 75572083