--- title: Florence-2 Document & Image Analyzer emoji: 📄 colorFrom: blue colorTo: purple sdk: gradio app_file: app.py pinned: false license: apache-2.0 short_description: Analyze images and PDFs with Florence-2 vision model tags: - computer-vision - florence-2 - document-analysis - pdf-processing - image-analysis - object-detection --- # Florence-2 Document & Image Analyzer An interactive Hugging Face Space that uses Microsoft's Florence-2 vision model to analyze uploaded images and PDF documents. The application provides comprehensive visual analysis with bounding box overlays, object detection, and detailed captions. ## Features - **Multi-format Support**: Upload PNG, JPG, JPEG images or PDF documents - **PDF Processing**: Automatically converts PDF pages to images for analysis - **Florence-2 Integration**: Uses the powerful Florence-2 model for: - Object detection with bounding boxes - Dense captioning - OCR text detection - Visual question answering - **Interactive Overlays**: View original and annotated versions side-by-side - **Batch Processing**: Handle multi-page PDFs efficiently - **User-Friendly Interface**: Clean Gradio interface with clear instructions ## How to Use 1. **Upload a file**: Choose an image (PNG/JPG/JPEG) or PDF document 2. **Select analysis type**: Choose from various Florence-2 tasks 3. **View results**: See original and annotated versions with overlays 4. **Download results**: Save processed images with annotations ## Model Information This Space uses Microsoft's Florence-2 model, a foundation vision model that can handle various computer vision and vision-language tasks with a single model architecture. ## Technical Details - **Framework**: Gradio 4.44.0 - **Model**: Microsoft Florence-2 (microsoft/Florence-2-large) - **PDF Processing**: pdf2image for page-by-page conversion - **Visualization**: PIL and OpenCV for overlay rendering - **Hardware**: Optimized for CPU and GPU inference ## Examples Upload any document or image to see Florence-2 in action: - **Documents**: Analyze layouts, detect text regions, identify tables - **Photos**: Object detection, scene understanding, detailed captions - **Screenshots**: UI element detection, text extraction - **Technical diagrams**: Component identification and labeling # Florence-2 Document & Image Analyzer This Space uses Gradio to provide an interactive interface for Microsoft's Florence-2 vision model. ## Features - Object Detection with bounding boxes - Detailed image captioning - OCR text extraction - Interactive Gradio interface - Model caching for performance Upload an image and select an analysis type to get started!