Felipe Meres
Convert Florence-2 space from Streamlit to Gradio
1ddb064
---
title: Florence-2 Document & Image Analyzer
emoji: ๐Ÿ“„
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
short_description: Analyze images and PDFs with Florence-2 vision model
tags:
- computer-vision
- florence-2
- document-analysis
- pdf-processing
- image-analysis
- object-detection
---
# Florence-2 Document & Image Analyzer
An interactive Hugging Face Space that uses Microsoft's Florence-2 vision model to analyze uploaded images and PDF documents. The application provides comprehensive visual analysis with bounding box overlays, object detection, and detailed captions.
## Features
- **Multi-format Support**: Upload PNG, JPG, JPEG images or PDF documents
- **PDF Processing**: Automatically converts PDF pages to images for analysis
- **Florence-2 Integration**: Uses the powerful Florence-2 model for:
- Object detection with bounding boxes
- Dense captioning
- OCR text detection
- Visual question answering
- **Interactive Overlays**: View original and annotated versions side-by-side
- **Batch Processing**: Handle multi-page PDFs efficiently
- **User-Friendly Interface**: Clean Gradio interface with clear instructions
## How to Use
1. **Upload a file**: Choose an image (PNG/JPG/JPEG) or PDF document
2. **Select analysis type**: Choose from various Florence-2 tasks
3. **View results**: See original and annotated versions with overlays
4. **Download results**: Save processed images with annotations
## Model Information
This Space uses Microsoft's Florence-2 model, a foundation vision model that can handle various computer vision and vision-language tasks with a single model architecture.
## Technical Details
- **Framework**: Gradio 4.44.0
- **Model**: Microsoft Florence-2 (microsoft/Florence-2-large)
- **PDF Processing**: pdf2image for page-by-page conversion
- **Visualization**: PIL and OpenCV for overlay rendering
- **Hardware**: Optimized for CPU and GPU inference
## Examples
Upload any document or image to see Florence-2 in action:
- **Documents**: Analyze layouts, detect text regions, identify tables
- **Photos**: Object detection, scene understanding, detailed captions
- **Screenshots**: UI element detection, text extraction
- **Technical diagrams**: Component identification and labeling
# Florence-2 Document & Image Analyzer
This Space uses Gradio to provide an interactive interface for Microsoft's Florence-2 vision model.
## Features
- Object Detection with bounding boxes
- Detailed image captioning
- OCR text extraction
- Interactive Gradio interface
- Model caching for performance
Upload an image and select an analysis type to get started!