---
title: Florence-2 Document & Image Analyzer
emoji: 📄
colorFrom: blue
colorTo: purple
sdk: gradio

app_file: app.py
pinned: false
license: apache-2.0
short_description: Analyze images and PDFs with Florence-2 vision model
tags:
- computer-vision
- florence-2
- document-analysis
- pdf-processing
- image-analysis
- object-detection
---

# Florence-2 Document & Image Analyzer

An interactive Hugging Face Space that uses Microsoft's Florence-2 vision model to analyze uploaded images and PDF documents. The application provides comprehensive visual analysis with bounding box overlays, object detection, and detailed captions.

## Features

- **Multi-format Support**: Upload PNG, JPG, JPEG images or PDF documents
- **PDF Processing**: Automatically converts PDF pages to images for analysis
- **Florence-2 Integration**: Uses the powerful Florence-2 model for:
  - Object detection with bounding boxes
  - Dense captioning
  - OCR text detection
  - Visual question answering
- **Interactive Overlays**: View original and annotated versions side-by-side
- **Batch Processing**: Handle multi-page PDFs efficiently
- **User-Friendly Interface**: Clean Gradio interface with clear instructions

## How to Use

1. **Upload a file**: Choose an image (PNG/JPG/JPEG) or PDF document
2. **Select analysis type**: Choose from various Florence-2 tasks
3. **View results**: See original and annotated versions with overlays
4. **Download results**: Save processed images with annotations

## Model Information

This Space uses Microsoft's Florence-2 model, a foundation vision model that can handle various computer vision and vision-language tasks with a single model architecture.

## Technical Details

- **Framework**: Gradio 4.44.0
- **Model**: Microsoft Florence-2 (microsoft/Florence-2-large)
- **PDF Processing**: pdf2image for page-by-page conversion
- **Visualization**: PIL and OpenCV for overlay rendering
- **Hardware**: Optimized for CPU and GPU inference

## Examples

Upload any document or image to see Florence-2 in action:
- **Documents**: Analyze layouts, detect text regions, identify tables
- **Photos**: Object detection, scene understanding, detailed captions
- **Screenshots**: UI element detection, text extraction
- **Technical diagrams**: Component identification and labeling
# Florence-2 Document & Image Analyzer

This Space uses Gradio to provide an interactive interface for Microsoft's Florence-2 vision model.

## Features
- Object Detection with bounding boxes
- Detailed image captioning
- OCR text extraction
- Interactive Gradio interface
- Model caching for performance

Upload an image and select an analysis type to get started!