|
|
--- |
|
|
library_name: diffusers |
|
|
--- |
|
|
|
|
|
# Florence-2 Image Annotator |
|
|
|
|
|
A custom [Modular Diffusers](https://huggingface.co/docs/diffusers/modular_diffusers/overview) block that uses [Florence-2](https://huggingface.co/docs/transformers/model_doc/florence2) for image annotation tasks like segmentation, object detection, and captioning. |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import ModularPipeline |
|
|
from diffusers.utils import load_image |
|
|
|
|
|
# Load the block |
|
|
image_annotator = ModularPipeline.from_pretrained( |
|
|
"diffusers/Florence2-image-Annotator", |
|
|
trust_remote_code=True |
|
|
) |
|
|
image_annotator.load_components(torch_dtype=torch.bfloat16) |
|
|
image_annotator.to("cuda") |
|
|
|
|
|
# Load an image |
|
|
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg") |
|
|
image = image.resize((1024, 1024)) |
|
|
|
|
|
# Generate a segmentation mask |
|
|
output = image_annotator( |
|
|
image=image, |
|
|
annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>", |
|
|
annotation_prompt="the car", |
|
|
annotation_output_type="mask_image", |
|
|
) |
|
|
output.mask_image[0].save("car-mask.png") |
|
|
``` |
|
|
|
|
|
### Compose with Inpainting Pipeline |
|
|
|
|
|
```python |
|
|
from diffusers import ModularPipeline |
|
|
|
|
|
# Load the annotator |
|
|
image_annotator = ModularPipeline.from_pretrained( |
|
|
"diffusers/Florence2-image-Annotator", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# Get an inpainting workflow and insert the annotator |
|
|
# repo_id = .. # you can use SDXL/flux/qwen any pipeline support Inpaint |
|
|
inpaint_blocks = ModularPipeline.from_pretrained(repo_id).blocks.get_workflow("inpainting") |
|
|
inpaint_blocks.sub_blocks.insert("image_annotator", image_annotator.blocks, 0) |
|
|
|
|
|
# Initialize the combined pipeline |
|
|
pipe = inpaint_blocks.init_pipeline() |
|
|
pipe.load_components(torch_dtype=torch.float16, device="cuda") |
|
|
|
|
|
# Inpaint with automatic mask generation |
|
|
output = pipe( |
|
|
prompt=prompt, |
|
|
image=image, |
|
|
annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>", |
|
|
annotation_prompt="the car", |
|
|
annotation_output_type="mask_image", |
|
|
num_inference_steps=30, |
|
|
output="images" |
|
|
) |
|
|
output[0].save("inpainted-car.png") |
|
|
``` |
|
|
|
|
|
## Supported Tasks |
|
|
|
|
|
| Task | Description | |
|
|
|------|-------------| |
|
|
| `<OD>` | Object detection | |
|
|
| `<REFERRING_EXPRESSION_SEGMENTATION>` | Segment specific objects based on text | |
|
|
| `<CAPTION>` | Generate image caption | |
|
|
| `<DETAILED_CAPTION>` | Generate detailed caption | |
|
|
| `<MORE_DETAILED_CAPTION>` | Generate very detailed caption | |
|
|
| `<DENSE_REGION_CAPTION>` | Caption different regions | |
|
|
| `<CAPTION_TO_PHRASE_GROUNDING>` | Ground phrases to regions | |
|
|
| `<OPEN_VOCABULARY_DETECTION>` | Detect objects from open vocabulary | |
|
|
|
|
|
## Output Types |
|
|
|
|
|
| Type | Description | |
|
|
|------|-------------| |
|
|
| `mask_image` | Black and white mask image | |
|
|
| `mask_overlay` | Mask overlaid on original image | |
|
|
| `bounding_box` | Bounding boxes drawn on image | |
|
|
|
|
|
## Inputs |
|
|
|
|
|
| Parameter | Type | Required | Default | Description | |
|
|
|-----------|------|----------|---------|-------------| |
|
|
| `image` | `PIL.Image` | Yes | - | Image to annotate | |
|
|
| `annotation_task` | `str` | No | `<REFERRING_EXPRESSION_SEGMENTATION>` | Task to perform | |
|
|
| `annotation_prompt` | `str` | Yes | - | Text prompt for the task | |
|
|
| `annotation_output_type` | `str` | No | `mask_image` | Output format | |
|
|
|
|
|
## Outputs |
|
|
|
|
|
| Parameter | Type | Description | |
|
|
|-----------|------|-------------| |
|
|
| `mask_image` | `PIL.Image` | Generated mask (when output type is `mask_image`) | |
|
|
| `image` | `PIL.Image` | Annotated image (when output type is `mask_overlay` or `bounding_box`) | |
|
|
| `annotations` | `dict` | Raw annotation predictions | |
|
|
|
|
|
## Components |
|
|
|
|
|
This block uses the following models from [florence-community/Florence-2-base-ft](https://huggingface.co/florence-community/Florence-2-base-ft): |
|
|
|
|
|
- `image_annotator`: `Florence2ForConditionalGeneration` |
|
|
- `image_annotator_processor`: `AutoProcessor` |
|
|
|
|
|
## Learn More |
|
|
|
|
|
- [Building Custom Blocks Guide](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) |
|
|
- [Modular Diffusers Overview](https://huggingface.co/docs/diffusers/modular_diffusers/overview) |
|
|
- [Modular Diffusers Custom Blocks Collection](https://huggingface.co/collections/diffusers/modular-diffusers-custom-blocks) |