Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.4.0
metadata
title: Florence-2 Document & Image Analyzer
emoji: 📄
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
short_description: Analyze images and PDFs with Florence-2 vision model
tags:
- computer-vision
- florence-2
- document-analysis
- pdf-processing
- image-analysis
- object-detection
Florence-2 Document & Image Analyzer
An interactive Hugging Face Space that uses Microsoft's Florence-2 vision model to analyze uploaded images and PDF documents. The application provides comprehensive visual analysis with bounding box overlays, object detection, and detailed captions.
Features
- Multi-format Support: Upload PNG, JPG, JPEG images or PDF documents
- PDF Processing: Automatically converts PDF pages to images for analysis
- Florence-2 Integration: Uses the powerful Florence-2 model for:
- Object detection with bounding boxes
- Dense captioning
- OCR text detection
- Visual question answering
- Interactive Overlays: View original and annotated versions side-by-side
- Batch Processing: Handle multi-page PDFs efficiently
- User-Friendly Interface: Clean Gradio interface with clear instructions
How to Use
- Upload a file: Choose an image (PNG/JPG/JPEG) or PDF document
- Select analysis type: Choose from various Florence-2 tasks
- View results: See original and annotated versions with overlays
- Download results: Save processed images with annotations
Model Information
This Space uses Microsoft's Florence-2 model, a foundation vision model that can handle various computer vision and vision-language tasks with a single model architecture.
Technical Details
- Framework: Gradio 4.44.0
- Model: Microsoft Florence-2 (microsoft/Florence-2-large)
- PDF Processing: pdf2image for page-by-page conversion
- Visualization: PIL and OpenCV for overlay rendering
- Hardware: Optimized for CPU and GPU inference
Examples
Upload any document or image to see Florence-2 in action:
- Documents: Analyze layouts, detect text regions, identify tables
- Photos: Object detection, scene understanding, detailed captions
- Screenshots: UI element detection, text extraction
- Technical diagrams: Component identification and labeling
Florence-2 Document & Image Analyzer
This Space uses Gradio to provide an interactive interface for Microsoft's Florence-2 vision model.
Features
- Object Detection with bounding boxes
- Detailed image captioning
- OCR text extraction
- Interactive Gradio interface
- Model caching for performance
Upload an image and select an analysis type to get started!