--- library_name: diffusers --- # Florence-2 Image Annotator A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block that uses [Florence-2](https://huggingface.co/docs/transformers/model_doc/florence2) for image annotation tasks like segmentation, object detection, and captioning. ## Usage ### Basic Usage ```python import torch from diffusers import ModularPipeline from diffusers.utils import load_image # Load the block image_annotator = ModularPipeline.from_pretrained( "diffusers/Florence2-image-Annotator", trust_remote_code=True ) image_annotator.load_components(torch_dtype=torch.bfloat16) image_annotator.to("cuda") # Load an image image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg") image = image.resize((1024, 1024)) # Generate a segmentation mask output = image_annotator( image=image, annotation_task="", annotation_prompt="the car", annotation_output_type="mask_image", ) output.mask_image[0].save("car-mask.png") ``` ### Compose with Inpainting Pipeline ```python from diffusers import ModularPipeline # Load the annotator image_annotator = ModularPipeline.from_pretrained( "diffusers/Florence2-image-Annotator", trust_remote_code=True ) # Get an inpainting workflow and insert the annotator # repo_id = .. # you can use SDXL/flux/qwen any pipeline support Inpaint inpaint_blocks = ModularPipeline.from_pretrained(repo_id).blocks.get_workflow("inpainting") inpaint_blocks.sub_blocks.insert("image_annotator", image_annotator.blocks, 0) # Initialize the combined pipeline pipe = inpaint_blocks.init_pipeline() pipe.load_components(torch_dtype=torch.float16, device="cuda") # Inpaint with automatic mask generation output = pipe( prompt=prompt, image=image, annotation_task="", annotation_prompt="the car", annotation_output_type="mask_image", num_inference_steps=30, output="images" ) output[0].save("inpainted-car.png") ``` ## Supported Tasks | Task | Description | |------|-------------| | `` | Object detection | | `` | Segment specific objects based on text | | `` | Generate image caption | | `` | Generate detailed caption | | `` | Generate very detailed caption | | `` | Caption different regions | | `` | Ground phrases to regions | | `` | Detect objects from open vocabulary | ## Output Types | Type | Description | |------|-------------| | `mask_image` | Black and white mask image | | `mask_overlay` | Mask overlaid on original image | | `bounding_box` | Bounding boxes drawn on image | ## Inputs | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `image` | `PIL.Image` | Yes | - | Image to annotate | | `annotation_task` | `str` | No | `` | Task to perform | | `annotation_prompt` | `str` | Yes | - | Text prompt for the task | | `annotation_output_type` | `str` | No | `mask_image` | Output format | ## Outputs | Parameter | Type | Description | |-----------|------|-------------| | `mask_image` | `PIL.Image` | Generated mask (when output type is `mask_image`) | | `image` | `PIL.Image` | Annotated image (when output type is `mask_overlay` or `bounding_box`) | | `annotations` | `dict` | Raw annotation predictions | ## Components This block uses the following models from [florence-community/Florence-2-base-ft](https://huggingface.co/florence-community/Florence-2-base-ft): - `image_annotator`: `Florence2ForConditionalGeneration` - `image_annotator_processor`: `AutoProcessor` ## Learn More - [Building Custom Blocks Guide](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) - [Modular Diffusers Overview](https://huggingface.co/docs/diffusers/modular_diffusers/overview) - [Modular Diffusers Custom Blocks Collection](https://huggingface.co/collections/diffusers/modular-diffusers-custom-blocks)