orrzohar commited on
Commit
ae15b76
·
verified ·
1 Parent(s): 918f558

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +137 -0
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ ---
5
+
6
+
7
+ # Emu2-Gen
8
+
9
+ [Paper](https://arxiv.org/abs/2312.13286) | [🤗HF Demo](https://huggingface.co/spaces/BAAI/Emu2) | [Demo](https://emu.ssi.plus) | [Project Page](https://baaivision.github.io/emu2/) | [Github](https://github.com/baaivision/Emu)
10
+
11
+
12
+ ## Model Weights
13
+
14
+ | Model name | Weight |
15
+ | ------------------ | ------------------------------------------------------- |
16
+ | **Emu2** | [🤗 HF link](https://huggingface.co/BAAI/Emu2) |
17
+ | **Emu2-Chat** | [🤗 HF link](https://huggingface.co/BAAI/Emu2-Chat) |
18
+ | **Emu2-Gen** | [🤗 HF link](https://huggingface.co/BAAI/Emu2-Gen) |
19
+
20
+
21
+ ## Inference (Huggingface Version)
22
+
23
+ ### Emu2-Gen
24
+
25
+ ```python
26
+ import cv2
27
+ from diffusers import DiffusionPipeline
28
+ import numpy as np
29
+ from PIL import Image
30
+ import requests
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
32
+ import torch
33
+
34
+ # For the first time of using,
35
+ # you need to download the huggingface repo "BAAI/Emu2-GEN" to local first
36
+ path = "path to local BAAI/Emu2-GEN"
37
+
38
+ multimodal_encoder = AutoModelForCausalLM.from_pretrained(
39
+ f"{path}/multimodal_encoder",
40
+ trust_remote_code=True,
41
+ torch_dtype=torch.bfloat16,
42
+ use_safetensors=True,
43
+ variant="bf16"
44
+ )
45
+ tokenizer = AutoTokenizer.from_pretrained(f"{path}/tokenizer")
46
+
47
+ pipe = DiffusionPipeline.from_pretrained(
48
+ path,
49
+ custom_pipeline="pipeline_emu2_gen",
50
+ torch_dtype=torch.bfloat16,
51
+ use_safetensors=True,
52
+ variant="bf16",
53
+ multimodal_encoder=multimodal_encoder,
54
+ tokenizer=tokenizer,
55
+ )
56
+
57
+ # For the non-first time of using, you can init the pipeline directly
58
+ pipe = DiffusionPipeline.from_pretrained(
59
+ path,
60
+ custom_pipeline="pipeline_emu2_gen",
61
+ torch_dtype=torch.bfloat16,
62
+ use_safetensors=True,
63
+ variant="bf16",
64
+ )
65
+
66
+ pipe.to("cuda")
67
+
68
+ # text-to-image
69
+ prompt = "impressionist painting of an astronaut in a jungle"
70
+ ret = pipe(prompt)
71
+ ret.image.save("astronaut.png")
72
+
73
+ # image editing
74
+ image = Image.open(requests.get('https://github.com/baaivision/Emu/Emu2/examples/dog.jpg?raw=true',stream=True).raw).convert('RGB')
75
+ prompt = [image, "wearing a red hat on the beach."]
76
+ ret = pipe(prompt)
77
+ ret.image.save("dog_hat_beach.png")
78
+
79
+ # grounding generation
80
+ def draw_box(left, top, right, bottom):
81
+ mask = np.zeros((448, 448, 3), dtype=np.uint8)
82
+ mask = cv2.rectangle(mask, (left, top), (right, bottom), (255, 255, 255), 3)
83
+ mask = Image.fromarray(mask)
84
+ return mask
85
+
86
+ dog1 = Image.open(requests.get('https://github.com/baaivision/Emu/Emu2/examples/dog1.jpg?raw=true',stream=True).raw).convert('RGB')
87
+ dog2 = Image.open(requests.get('https://github.com/baaivision/Emu/Emu2/examples/dog2.jpg?raw=true',stream=True).raw).convert('RGB')
88
+ dog3 = Image.open(requests.get('https://github.com/baaivision/Emu/Emu2/examples/dog3.jpg?raw=true',stream=True).raw).convert('RGB')
89
+ dog1_mask = draw_box( 22, 14, 224, 224)
90
+ dog2_mask = draw_box(224, 10, 448, 224)
91
+ dog3_mask = draw_box(120, 264, 320, 438)
92
+
93
+ prompt = [
94
+ "<grounding>",
95
+ "An oil painting of three dogs,",
96
+ "<phrase>the first dog</phrase>"
97
+ "<object>",
98
+ dog1_mask,
99
+ "</object>",
100
+ dog1,
101
+ "<phrase>the second dog</phrase>"
102
+ "<object>",
103
+ dog2_mask,
104
+ "</object>",
105
+ dog2,
106
+ "<phrase>the third dog</phrase>"
107
+ "<object>",
108
+ dog3_mask,
109
+ "</object>",
110
+ dog3,
111
+ ]
112
+ ret = pipe(prompt)
113
+ ret.image.save("three_dogs.png")
114
+
115
+ # Autoencoding
116
+ # to enable the autoencoding mode, you can only input exactly one image as prompt
117
+ # if you want the model to generate an image,
118
+ # please input extra empty text "" besides the image, e.g.
119
+ # autoencoding mode: prompt = image or [image]
120
+ # generation mode: prompt = ["", image] or [image, ""]
121
+ prompt = Image.open("./examples/doodle.jpg").convert("RGB")
122
+ ret = pipe(prompt)
123
+ ret.image.save("doodle_ae.png")
124
+ ```
125
+
126
+ ## Citation
127
+
128
+ If you find Emu2 useful for your research and applications, please consider starring this repository and citing:
129
+
130
+ ```
131
+ @article{Emu2,
132
+ title={Generative Multimodal Models are In-Context Learners},
133
+ author={Quan Sun and Yufeng Cui and Xiaosong Zhang and Fan Zhang and Qiying Yu and Zhengxiong Luo and Yueze Wang and Yongming Rao and Jingjing Liu and Tiejun Huang and Xinlong Wang},
134
+ publisher={arXiv preprint arXiv:2312.13286},
135
+ year={2023},
136
+ }
137
+ ```