YiYiXu HF Staff commited on
Commit
6d8a836
·
verified ·
1 Parent(s): ffb3067

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +119 -0
README.md CHANGED
@@ -1,3 +1,122 @@
1
  ---
2
  library_name: diffusers
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: diffusers
3
  ---
4
+
5
+ # Florence-2 Image Annotator
6
+
7
+ A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block that uses [Florence-2](https://huggingface.co/docs/transformers/model_doc/florence2) for image annotation tasks like segmentation, object detection, and captioning.
8
+
9
+ ## Usage
10
+
11
+ ### Basic Usage
12
+
13
+ ```python
14
+ import torch
15
+ from diffusers import ModularPipeline
16
+ from diffusers.utils import load_image
17
+
18
+ # Load the block
19
+ image_annotator = ModularPipeline.from_pretrained(
20
+ "diffusers/Florence2-image-Annotator",
21
+ trust_remote_code=True
22
+ )
23
+ image_annotator.load_components(torch_dtype=torch.bfloat16)
24
+ image_annotator.to("cuda")
25
+
26
+ # Load an image
27
+ image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg")
28
+ image = image.resize((1024, 1024))
29
+
30
+ # Generate a segmentation mask
31
+ output = image_annotator(
32
+ image=image,
33
+ annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>",
34
+ annotation_prompt="the car",
35
+ annotation_output_type="mask_image",
36
+ )
37
+ output.mask_image[0].save("car-mask.png")
38
+ ```
39
+
40
+ ### Compose with Inpainting Pipeline
41
+
42
+ ```python
43
+ from diffusers import ModularPipeline
44
+
45
+ # Load the annotator
46
+ image_annotator = ModularPipeline.from_pretrained(
47
+ "diffusers/Florence2-image-Annotator",
48
+ trust_remote_code=True
49
+ )
50
+
51
+ # Get an inpainting workflow and insert the annotator
52
+ # repo_id = .. # you can use SDXL/flux/qwen any pipeline support Inpaint
53
+ inpaint_blocks = ModularPipeline.from_pretrained(repo_id).blocks.get_workflow("inpainting")
54
+ inpaint_blocks.sub_blocks.insert("image_annotator", image_annotator.blocks, 0)
55
+
56
+ # Initialize the combined pipeline
57
+ pipe = inpaint_blocks.init_pipeline()
58
+ pipe.load_components(torch_dtype=torch.float16, device="cuda")
59
+
60
+ # Inpaint with automatic mask generation
61
+ output = pipe(
62
+ prompt=prompt,
63
+ image=image,
64
+ annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>",
65
+ annotation_prompt="the car",
66
+ annotation_output_type="mask_image",
67
+ num_inference_steps=30,
68
+ output="images"
69
+ )
70
+ output[0].save("inpainted-car.png")
71
+ ```
72
+
73
+ ## Supported Tasks
74
+
75
+ | Task | Description |
76
+ |------|-------------|
77
+ | `<OD>` | Object detection |
78
+ | `<REFERRING_EXPRESSION_SEGMENTATION>` | Segment specific objects based on text |
79
+ | `<CAPTION>` | Generate image caption |
80
+ | `<DETAILED_CAPTION>` | Generate detailed caption |
81
+ | `<MORE_DETAILED_CAPTION>` | Generate very detailed caption |
82
+ | `<DENSE_REGION_CAPTION>` | Caption different regions |
83
+ | `<CAPTION_TO_PHRASE_GROUNDING>` | Ground phrases to regions |
84
+ | `<OPEN_VOCABULARY_DETECTION>` | Detect objects from open vocabulary |
85
+
86
+ ## Output Types
87
+
88
+ | Type | Description |
89
+ |------|-------------|
90
+ | `mask_image` | Black and white mask image |
91
+ | `mask_overlay` | Mask overlaid on original image |
92
+ | `bounding_box` | Bounding boxes drawn on image |
93
+
94
+ ## Inputs
95
+
96
+ | Parameter | Type | Required | Default | Description |
97
+ |-----------|------|----------|---------|-------------|
98
+ | `image` | `PIL.Image` | Yes | - | Image to annotate |
99
+ | `annotation_task` | `str` | No | `<REFERRING_EXPRESSION_SEGMENTATION>` | Task to perform |
100
+ | `annotation_prompt` | `str` | Yes | - | Text prompt for the task |
101
+ | `annotation_output_type` | `str` | No | `mask_image` | Output format |
102
+
103
+ ## Outputs
104
+
105
+ | Parameter | Type | Description |
106
+ |-----------|------|-------------|
107
+ | `mask_image` | `PIL.Image` | Generated mask (when output type is `mask_image`) |
108
+ | `image` | `PIL.Image` | Annotated image (when output type is `mask_overlay` or `bounding_box`) |
109
+ | `annotations` | `dict` | Raw annotation predictions |
110
+
111
+ ## Components
112
+
113
+ This block uses the following models from [florence-community/Florence-2-base-ft](https://huggingface.co/florence-community/Florence-2-base-ft):
114
+
115
+ - `image_annotator`: `Florence2ForConditionalGeneration`
116
+ - `image_annotator_processor`: `AutoProcessor`
117
+
118
+ ## Learn More
119
+
120
+ - [Building Custom Blocks Guide](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks)
121
+ - [Modular Diffusers Overview](https://huggingface.co/docs/diffusers/modular_diffusers/overview)
122
+ - [Modular Diffusers Custom Blocks Collection](https://huggingface.co/collections/diffusers/modular-diffusers-custom-blocks)