dany0407 commited on
Commit
9ee9f5a
·
verified ·
1 Parent(s): 67eb693

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -2,34 +2,26 @@
2
  *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
  *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
  *.ftz filter=lfs diff=lfs merge=lfs -text
7
  *.gz filter=lfs diff=lfs merge=lfs -text
8
  *.h5 filter=lfs diff=lfs merge=lfs -text
9
  *.joblib filter=lfs diff=lfs merge=lfs -text
10
  *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
  *.model filter=lfs diff=lfs merge=lfs -text
13
  *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
  *.onnx filter=lfs diff=lfs merge=lfs -text
17
  *.ot filter=lfs diff=lfs merge=lfs -text
18
  *.parquet filter=lfs diff=lfs merge=lfs -text
19
  *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
  *.pt filter=lfs diff=lfs merge=lfs -text
23
  *.pth filter=lfs diff=lfs merge=lfs -text
24
  *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
  *.tgz filter=lfs diff=lfs merge=lfs -text
31
  *.wasm filter=lfs diff=lfs merge=lfs -text
32
  *.xz filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
2
  *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
  *.bz2 filter=lfs diff=lfs merge=lfs -text
 
5
  *.ftz filter=lfs diff=lfs merge=lfs -text
6
  *.gz filter=lfs diff=lfs merge=lfs -text
7
  *.h5 filter=lfs diff=lfs merge=lfs -text
8
  *.joblib filter=lfs diff=lfs merge=lfs -text
9
  *.lfs.* filter=lfs diff=lfs merge=lfs -text
 
10
  *.model filter=lfs diff=lfs merge=lfs -text
11
  *.msgpack filter=lfs diff=lfs merge=lfs -text
 
 
12
  *.onnx filter=lfs diff=lfs merge=lfs -text
13
  *.ot filter=lfs diff=lfs merge=lfs -text
14
  *.parquet filter=lfs diff=lfs merge=lfs -text
15
  *.pb filter=lfs diff=lfs merge=lfs -text
 
 
16
  *.pt filter=lfs diff=lfs merge=lfs -text
17
  *.pth filter=lfs diff=lfs merge=lfs -text
18
  *.rar filter=lfs diff=lfs merge=lfs -text
 
19
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
20
  *.tar.* filter=lfs diff=lfs merge=lfs -text
 
21
  *.tflite filter=lfs diff=lfs merge=lfs -text
22
  *.tgz filter=lfs diff=lfs merge=lfs -text
23
  *.wasm filter=lfs diff=lfs merge=lfs -text
24
  *.xz filter=lfs diff=lfs merge=lfs -text
25
  *.zip filter=lfs diff=lfs merge=lfs -text
26
+ *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - pytorch
4
+ license: mit
5
+ ---
6
+
7
+ # min(DALL·E)
8
+
9
+ [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kuprel/min-dalle/blob/main/min_dalle.ipynb)
10
+ [![Discord](https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white)](https://discord.com/channels/823813159592001537/912729332311556136)
11
+
12
+ **[GitHub](https://github.com/kuprel/min-dalle)**
13
+
14
+ This is a fast, minimal port of Boris Dayma's [DALL·E Mini](https://github.com/borisdayma/dalle-mini) (with mega weights). It has been stripped down for inference and converted to PyTorch. The only third party dependencies are numpy, requests, pillow and torch.
15
+
16
+ To generate a 4x4 grid of DALL·E Mega images it takes:
17
+ - 89 sec with a T4 in Colab
18
+ - 48 sec with a P100 in Colab
19
+ - 13 sec with an A100 on Replicate
20
+
21
+ Here's a more detailed breakdown of performance on an A100. Credit to [@technobird22](https://github.com/technobird22) and his [NeoGen](https://github.com/technobird22/NeoGen) discord bot for the graph.
22
+ <br />
23
+ <img src="https://github.com/kuprel/min-dalle/raw/main/performance.png" alt="min-dalle" width="450"/>
24
+ <br />
25
+
26
+ The flax model and code for converting it to torch can be found [here](https://github.com/kuprel/min-dalle-flax).
27
+
28
+ ## Install
29
+
30
+ ```bash
31
+ $ pip install min-dalle
32
+ ```
33
+
34
+ ## Usage
35
+
36
+ Load the model parameters once and reuse the model to generate multiple images.
37
+
38
+ ```python
39
+ from min_dalle import MinDalle
40
+
41
+ model = MinDalle(
42
+ models_root='./pretrained',
43
+ dtype=torch.float32,
44
+ device='cuda',
45
+ is_mega=True,
46
+ is_reusable=True
47
+ )
48
+ ```
49
+
50
+ The required models will be downloaded to `models_root` if they are not already there. Set the `dtype` to `torch.float16` to save GPU memory. If you have an Ampere architecture GPU you can use `torch.bfloat16`. Set the `device` to either "cuda" or "cpu". Once everything has finished initializing, call `generate_image` with some text as many times as you want. Use a positive `seed` for reproducible results. Higher values for `supercondition_factor` result in better agreement with the text but a narrower variety of generated images. Every image token is sampled from the `top_k` most probable tokens. The largest logit is subtracted from the logits to avoid infs. The logits are then divided by the `temperature`. If `is_seamless` is true, the image grid will be tiled in token space not pixel space.
51
+
52
+ ```python
53
+ image = model.generate_image(
54
+ text='Nuclear explosion broccoli',
55
+ seed=-1,
56
+ grid_size=4,
57
+ is_seamless=False,
58
+ temperature=1,
59
+ top_k=256,
60
+ supercondition_factor=32,
61
+ is_verbose=False
62
+ )
63
+
64
+ display(image)
65
+ ```
66
+ <img src="https://github.com/kuprel/min-dalle/raw/main/examples/nuclear_broccoli.jpg" alt="min-dalle" width="400"/>
67
+
68
+ Credit to [@hardmaru](https://twitter.com/hardmaru) for the [example](https://twitter.com/hardmaru/status/1544354119527596034)
69
+
70
+
71
+ ### Saving Individual Images
72
+ The images can also be generated as a `FloatTensor` in case you want to process them manually.
73
+
74
+ ```python
75
+ images = model.generate_images(
76
+ text='Nuclear explosion broccoli',
77
+ seed=-1,
78
+ grid_size=3,
79
+ is_seamless=False,
80
+ temperature=1,
81
+ top_k=256,
82
+ supercondition_factor=16,
83
+ is_verbose=False
84
+ )
85
+ ```
86
+
87
+ To get an image into PIL format you will have to first move the images to the CPU and convert the tensor to a numpy array.
88
+ ```python
89
+ images = images.to('cpu').numpy()
90
+ ```
91
+ Then image $i$ can be coverted to a PIL.Image and saved
92
+ ```python
93
+ image = Image.fromarray(images[i])
94
+ image.save('image_{}.png'.format(i))
95
+ ```
96
+
97
+ ### Progressive Outputs
98
+
99
+ If the model is being used interactively (e.g. in a notebook) `generate_image_stream` can be used to generate a stream of images as the model is decoding. The detokenizer adds a slight delay for each image. Set `progressive_outputs` to `True` to enable this. An example is implemented in the colab.
100
+
101
+ ```python
102
+ image_stream = model.generate_image_stream(
103
+ text='Dali painting of WALL·E',
104
+ seed=-1,
105
+ grid_size=3,
106
+ progressive_outputs=True,
107
+ is_seamless=False,
108
+ temperature=1,
109
+ top_k=256,
110
+ supercondition_factor=16,
111
+ is_verbose=False
112
+ )
113
+
114
+ for image in image_stream:
115
+ display(image)
116
+ ```
117
+ <img src="https://github.com/kuprel/min-dalle/raw/main/examples/dali_walle_animated.gif" alt="min-dalle" width="300"/>
118
+
119
+ ### Command Line
120
+
121
+ Use `image_from_text.py` to generate images from the command line.
122
+
123
+ ```bash
124
+ $ python image_from_text.py --text='artificial intelligence' --no-mega
125
+ ```
126
+ <img src="https://github.com/kuprel/min-dalle/raw/main/examples/artificial_intelligence.jpg" alt="min-dalle" width="200"/>
127
+
128
+ **[❤️ Sponsor](https://github.com/sponsors/kuprel)**
config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "d_model": 2048,
3
+ "decoder_attention_heads": 32,
4
+ "decoder_ffn_dim": 4096,
5
+ "decoder_layers": 24,
6
+ "decoder_start_token_id": 16384,
7
+ "encoder_attention_heads": 32,
8
+ "encoder_ffn_dim": 4096,
9
+ "encoder_layers": 24,
10
+ "encoder_vocab_size": 50272,
11
+ "image_length": 256,
12
+ "image_vocab_size": 16415,
13
+ "max_text_length": 64
14
+ }
config_mini.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "d_model": 1024,
3
+ "decoder_attention_heads": 16,
4
+ "decoder_ffn_dim": 2730,
5
+ "decoder_layers": 12,
6
+ "decoder_start_token_id": 16384,
7
+ "encoder_attention_heads": 16,
8
+ "encoder_ffn_dim": 2730,
9
+ "encoder_layers": 12,
10
+ "encoder_vocab_size": 50264,
11
+ "image_length": 256,
12
+ "image_vocab_size": 16384,
13
+ "max_text_length": 64
14
+ }
dalle_bart_mega/decoder.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc94a4dff25b97575239eb21e1ae3d988beff95470fec9903d5dcad521ab9efa
3
+ size 2955054938
dalle_bart_mega/encoder.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76e7797b19625122e21138556e06b78cc84686ed5f5e5c42aa53c81ffb2f4bb8
3
+ size 2220079602
dalle_bart_mega/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
dalle_bart_mega/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
decoder.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc94a4dff25b97575239eb21e1ae3d988beff95470fec9903d5dcad521ab9efa
3
+ size 2955054938
decoder_mini.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8343c2bf982cb60d463417facc4debeeaabc9f238a1f7115ac5bdf778ea3901
3
+ size 470515674
decoder_v26.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78ce1ebd38848e6233d7fe6088d56a6754f8eeb400926854db694fbe633f6ae1
3
+ size 2955054938
detoker.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e1ea14cb2a45661d4599d897779673ff93e2ad4202e311edc0c837e0d86933c
3
+ size 186899831
encoder.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76e7797b19625122e21138556e06b78cc84686ed5f5e5c42aa53c81ffb2f4bb8
3
+ size 2220079602
encoder_mini.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:158e1ff118b6e85c7b03577178758f94a24b6e1625e8f8ee6415fbde020b938b
3
+ size 405201458
encoder_v26.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b673011bce7410da947e97a9725369f677ff79a1a0a61611a8fc818dde64ed22
3
+ size 2220079602
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
merges_mini.txt ADDED
The diff for this file is too large to render. See raw diff
 
vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
vocab_mini.json ADDED
The diff for this file is too large to render. See raw diff
 
vqgan/detoker.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e1ea14cb2a45661d4599d897779673ff93e2ad4202e311edc0c837e0d86933c
3
+ size 186899831