diffusers
/

Florence2-image-Annotator

Diffusers

Model card Files Files and versions

xet

Community

YiYiXu HF Staff commited on Jan 31

Commit

6d8a836

verified ·

1 Parent(s): ffb3067

Update README.md

Browse files

Files changed (1) hide show

README.md +119 -0

README.md CHANGED Viewed

@@ -1,3 +1,122 @@
 ---
 library_name: diffusers
 ---

 ---
 library_name: diffusers
 ---
+# Florence-2 Image Annotator
+A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block that uses [Florence-2](https://huggingface.co/docs/transformers/model_doc/florence2) for image annotation tasks like segmentation, object detection, and captioning.
+## Usage
+### Basic Usage
+```python
+import torch
+from diffusers import ModularPipeline
+from diffusers.utils import load_image
+# Load the block
+image_annotator = ModularPipeline.from_pretrained(
+    "diffusers/Florence2-image-Annotator",
+    trust_remote_code=True
+)
+image_annotator.load_components(torch_dtype=torch.bfloat16)
+image_annotator.to("cuda")
+# Load an image
+image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg")
+image = image.resize((1024, 1024))
+# Generate a segmentation mask
+output = image_annotator(
+    image=image,
+    annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>",
+    annotation_prompt="the car",
+    annotation_output_type="mask_image",
+)
+output.mask_image[0].save("car-mask.png")
+```
+### Compose with Inpainting Pipeline
+```python
+from diffusers import ModularPipeline
+# Load the annotator
+image_annotator = ModularPipeline.from_pretrained(
+    "diffusers/Florence2-image-Annotator",
+    trust_remote_code=True
+)
+# Get an inpainting workflow and insert the annotator
+# repo_id = .. # you can use SDXL/flux/qwen any pipeline support Inpaint
+inpaint_blocks = ModularPipeline.from_pretrained(repo_id).blocks.get_workflow("inpainting")
+inpaint_blocks.sub_blocks.insert("image_annotator", image_annotator.blocks, 0)
+# Initialize the combined pipeline
+pipe = inpaint_blocks.init_pipeline()
+pipe.load_components(torch_dtype=torch.float16, device="cuda")
+# Inpaint with automatic mask generation
+output = pipe(
+    prompt=prompt,
+    image=image,
+    annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>",
+    annotation_prompt="the car",
+    annotation_output_type="mask_image",
+    num_inference_steps=30,
+    output="images"
+)
+output[0].save("inpainted-car.png")
+```
+## Supported Tasks
+| Task | Description |
+|------|-------------|
+| `<OD>` | Object detection |
+| `<REFERRING_EXPRESSION_SEGMENTATION>` | Segment specific objects based on text |
+| `<CAPTION>` | Generate image caption |
+| `<DETAILED_CAPTION>` | Generate detailed caption |
+| `<MORE_DETAILED_CAPTION>` | Generate very detailed caption |
+| `<DENSE_REGION_CAPTION>` | Caption different regions |
+| `<CAPTION_TO_PHRASE_GROUNDING>` | Ground phrases to regions |
+| `<OPEN_VOCABULARY_DETECTION>` | Detect objects from open vocabulary |
+## Output Types
+| Type | Description |
+|------|-------------|
+| `mask_image` | Black and white mask image |
+| `mask_overlay` | Mask overlaid on original image |
+| `bounding_box` | Bounding boxes drawn on image |
+## Inputs
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `image` | `PIL.Image` | Yes | - | Image to annotate |
+| `annotation_task` | `str` | No | `<REFERRING_EXPRESSION_SEGMENTATION>` | Task to perform |
+| `annotation_prompt` | `str` | Yes | - | Text prompt for the task |
+| `annotation_output_type` | `str` | No | `mask_image` | Output format |
+## Outputs
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `mask_image` | `PIL.Image` | Generated mask (when output type is `mask_image`) |
+| `image` | `PIL.Image` | Annotated image (when output type is `mask_overlay` or `bounding_box`) |
+| `annotations` | `dict` | Raw annotation predictions |
+## Components
+This block uses the following models from [florence-community/Florence-2-base-ft](https://huggingface.co/florence-community/Florence-2-base-ft):
+- `image_annotator`: `Florence2ForConditionalGeneration`
+- `image_annotator_processor`: `AutoProcessor`
+## Learn More
+- [Building Custom Blocks Guide](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks)
+- [Modular Diffusers Overview](https://huggingface.co/docs/diffusers/modular_diffusers/overview)
+- [Modular Diffusers Custom Blocks Collection](https://huggingface.co/collections/diffusers/modular-diffusers-custom-blocks)