OzzyGT HF Staff commited on
Commit
6ef09aa
·
1 Parent(s): 4c1b191
Files changed (2) hide show
  1. README.md +273 -0
  2. block.py +4 -0
README.md CHANGED
@@ -12,6 +12,48 @@ The node can be used with the default installation of Mellon using the `Dynamic
12
 
13
  ## Using it with code
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ### Object Detection
16
 
17
  ```python
@@ -44,3 +86,234 @@ output.save("output.png")
44
  | Input | Output |
45
  | ------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------- |
46
  | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png) | ![Output](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/object_detection.png) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  ## Using it with code
14
 
15
+ ### Captioning
16
+
17
+ ```python
18
+ import torch
19
+
20
+ from diffusers.modular_pipelines import ModularPipeline
21
+ from diffusers.utils import load_image
22
+
23
+
24
+ pipe = ModularPipeline.from_pretrained("OzzyGT/florence-2-block", trust_remote_code=True)
25
+ pipe.load_components(torch_dtype=torch.float16)
26
+ pipe.to("cuda")
27
+
28
+ image = load_image(
29
+ "https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png"
30
+ )
31
+
32
+ annotation_task = "<CAPTION>" # can also be <DETAILED_CAPTION> or <MORE_DETAILED_CAPTION>
33
+ annotation_prompt = ""
34
+
35
+ output = pipe(image=image, annotation_task=annotation_task, annotation_prompt=annotation_prompt).annotations[0]
36
+ print(output)
37
+ ```
38
+
39
+ #### Caption
40
+
41
+ ```
42
+ A man and a woman writing on a white board.
43
+ ```
44
+
45
+ #### Detailed Caption
46
+
47
+ ```
48
+ In this image we can see a man and a woman holding markers in their hands. We can also see a board with some text on it.
49
+ ```
50
+
51
+ #### More Detailed Caption
52
+
53
+ ```
54
+ A man and a woman are standing in front of a whiteboard. The woman is writing on a black marker. The man is wearing a blue shirt. The whiteboard has writing on it. The writing on the whiteboard is black. The people are looking at each other. There is writing in black marker on the board. There are drawings on whiteboard behind the people.
55
+ ```
56
+
57
  ### Object Detection
58
 
59
  ```python
 
86
  | Input | Output |
87
  | ------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------- |
88
  | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png) | ![Output](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/object_detection.png) |
89
+
90
+ ### Dense Region Caption
91
+
92
+ ```python
93
+ import torch
94
+
95
+ from diffusers.modular_pipelines import ModularPipeline
96
+ from diffusers.utils import load_image
97
+
98
+
99
+ pipe = ModularPipeline.from_pretrained("OzzyGT/florence-2-block", trust_remote_code=True)
100
+ pipe.load_components(torch_dtype=torch.float16)
101
+ pipe.to("cuda")
102
+
103
+ image = load_image(
104
+ "https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png"
105
+ )
106
+
107
+ annotation_task = "<DENSE_REGION_CAPTION>"
108
+ annotation_prompt = ""
109
+
110
+ output = pipe(
111
+ image=image,
112
+ annotation_task=annotation_task,
113
+ annotation_prompt=annotation_prompt,
114
+ annotation_output_type="bounding_box",
115
+ ).images[0]
116
+ output.save("output.png")
117
+ ```
118
+
119
+ | Input | Output |
120
+ | ------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------- |
121
+ | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png) | ![Output](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/dense_region_caption.png) |
122
+
123
+ ### Region Proposal
124
+
125
+ ```python
126
+ import torch
127
+
128
+ from diffusers.modular_pipelines import ModularPipeline
129
+ from diffusers.utils import load_image
130
+
131
+
132
+ pipe = ModularPipeline.from_pretrained("OzzyGT/florence-2-block", trust_remote_code=True)
133
+ pipe.load_components(torch_dtype=torch.float16)
134
+ pipe.to("cuda")
135
+
136
+ image = load_image(
137
+ "https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png"
138
+ )
139
+
140
+ annotation_task = "<REGION_PROPOSAL>"
141
+ annotation_prompt = ""
142
+
143
+ output = pipe(
144
+ image=image,
145
+ annotation_task=annotation_task,
146
+ annotation_prompt=annotation_prompt,
147
+ annotation_output_type="bounding_box",
148
+ ).images[0]
149
+ output.save("output.png")
150
+ ```
151
+
152
+ | Input | Output |
153
+ | ------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------- |
154
+ | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png) | ![Output](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/region_proposal.png) |
155
+
156
+ ### Phrase Grounding
157
+
158
+ ```python
159
+ import torch
160
+
161
+ from diffusers.modular_pipelines import ModularPipeline
162
+ from diffusers.utils import load_image
163
+
164
+
165
+ pipe = ModularPipeline.from_pretrained("OzzyGT/florence-2-block", trust_remote_code=True)
166
+ pipe.load_components(torch_dtype=torch.float16)
167
+ pipe.to("cuda")
168
+
169
+ image = load_image(
170
+ "https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png"
171
+ )
172
+
173
+ annotation_task = "<CAPTION_TO_PHRASE_GROUNDING>"
174
+ annotation_prompt = "man"
175
+
176
+ output = pipe(
177
+ image=image,
178
+ annotation_task=annotation_task,
179
+ annotation_prompt=annotation_prompt,
180
+ annotation_output_type="bounding_box", # can also use `mask_image` and `mask_overlay`
181
+ ).images[0]
182
+ output.save("output.png")
183
+ ```
184
+
185
+ | Input | Bounding Box | Mask Image | Mask Overlay |
186
+ | ------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
187
+ | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png) | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/phrase_grounding_bbox.png) | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/phrase_grounding_mask.png) | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/phrase_grounding_overlay.png) |
188
+
189
+ ### Referring Expression Segmentation
190
+
191
+ ```python
192
+ import torch
193
+
194
+ from diffusers.modular_pipelines import ModularPipeline
195
+ from diffusers.utils import load_image
196
+
197
+
198
+ pipe = ModularPipeline.from_pretrained("OzzyGT/florence-2-block", trust_remote_code=True)
199
+ pipe.load_components(torch_dtype=torch.float16)
200
+ pipe.to("cuda")
201
+
202
+ image = load_image(
203
+ "https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png"
204
+ )
205
+
206
+ annotation_task = "<REFERRING_EXPRESSION_SEGMENTATION>"
207
+ annotation_prompt = "man"
208
+
209
+ output = pipe(
210
+ image=image,
211
+ annotation_task=annotation_task,
212
+ annotation_prompt=annotation_prompt,
213
+ annotation_output_type="mask_image", # can also use `mask_overlay`
214
+ ).images[0]
215
+ output.save("output.png")
216
+ ```
217
+
218
+ | Input | Mask Image | Mask Overlay |
219
+ | ------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
220
+ | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png) | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/ref_exp_seg_mask.png) | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/ref_exp_seg_overlay.png) |
221
+
222
+ ### Open Vocabulary Detection
223
+
224
+ ```python
225
+ import torch
226
+
227
+ from diffusers.modular_pipelines import ModularPipeline
228
+ from diffusers.utils import load_image
229
+
230
+
231
+ pipe = ModularPipeline.from_pretrained("OzzyGT/florence-2-block", trust_remote_code=True)
232
+ pipe.load_components(torch_dtype=torch.float16)
233
+ pipe.to("cuda")
234
+
235
+ image = load_image(
236
+ "https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png"
237
+ )
238
+
239
+ annotation_task = "<OPEN_VOCABULARY_DETECTION>"
240
+ annotation_prompt = "man with a beard"
241
+
242
+ output = pipe(
243
+ image=image,
244
+ annotation_task=annotation_task,
245
+ annotation_prompt=annotation_prompt,
246
+ annotation_output_type="bounding_box",
247
+ ).images[0]
248
+ output.save("output.png")
249
+ ```
250
+
251
+ | Input | Output |
252
+ | ------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------- |
253
+ | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png) | ![Output](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/open_vocabulary.png) |
254
+
255
+ ### OCR
256
+
257
+ ```python
258
+ import torch
259
+
260
+ from diffusers.modular_pipelines import ModularPipeline
261
+ from diffusers.utils import load_image
262
+
263
+
264
+ pipe = ModularPipeline.from_pretrained("OzzyGT/florence-2-block", trust_remote_code=True)
265
+ pipe.load_components(torch_dtype=torch.float16)
266
+ pipe.to("cuda")
267
+
268
+ image = load_image(
269
+ "https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png"
270
+ )
271
+
272
+ annotation_task = "<OCR>"
273
+ annotation_prompt = ""
274
+
275
+ output = pipe(
276
+ image=image,
277
+ annotation_task=annotation_task,
278
+ annotation_prompt=annotation_prompt,
279
+ annotation_output_type="bounding_box",
280
+ ).annotations[0]
281
+ print(output)
282
+ ```
283
+
284
+ ```
285
+ The Diffuser's library byHugging Face makes it easyfor developers to run imagegeneration and influenceusing state-of-the-astdiffusion models withjust a few lines of codehuman eou
286
+ ```
287
+
288
+ ### OCR with region
289
+
290
+ ```python
291
+ import torch
292
+
293
+ from diffusers.modular_pipelines import ModularPipeline
294
+ from diffusers.utils import load_image
295
+
296
+
297
+ pipe = ModularPipeline.from_pretrained("OzzyGT/florence-2-block", trust_remote_code=True)
298
+ pipe.load_components(torch_dtype=torch.float16)
299
+ pipe.to("cuda")
300
+
301
+ image = load_image(
302
+ "https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png"
303
+ )
304
+
305
+ annotation_task = "<OCR_WITH_REGION>"
306
+ annotation_prompt = ""
307
+
308
+ output = pipe(
309
+ image=image,
310
+ annotation_task=annotation_task,
311
+ annotation_prompt=annotation_prompt,
312
+ annotation_output_type="bounding_box",
313
+ ).images[0]
314
+ output.save("output.png")
315
+ ```
316
+
317
+ | Input | Output |
318
+ | ------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
319
+ | ![Input](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/white_board_people.png) | ![Output](https://huggingface.co/datasets/OzzyGT/diffusers-examples/resolve/main/florence-2/ocr_region.png) |
block.py CHANGED
@@ -270,6 +270,10 @@ class Florence2ImageAnnotatorBlock(ModularPipelineBlocks):
270
  # Standard axis-aligned boxes
271
  bboxes = _annotation.get("bboxes", [])
272
  labels = _annotation.get("labels", [])
 
 
 
 
273
  for i, bbox in enumerate(bboxes):
274
  flat = np.array(bbox).flatten().tolist()
275
 
 
270
  # Standard axis-aligned boxes
271
  bboxes = _annotation.get("bboxes", [])
272
  labels = _annotation.get("labels", [])
273
+
274
+ if len(labels) == 0:
275
+ labels = _annotation.get("bboxes_labels", [])
276
+
277
  for i, bbox in enumerate(bboxes):
278
  flat = np.array(bbox).flatten().tolist()
279