Felipe Meres
Convert Florence-2 space from Streamlit to Gradio
1ddb064

A newer version of the Gradio SDK is available: 6.4.0

Upgrade
metadata
title: Florence-2 Document & Image Analyzer
emoji: 📄
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
short_description: Analyze images and PDFs with Florence-2 vision model
tags:
  - computer-vision
  - florence-2
  - document-analysis
  - pdf-processing
  - image-analysis
  - object-detection

Florence-2 Document & Image Analyzer

An interactive Hugging Face Space that uses Microsoft's Florence-2 vision model to analyze uploaded images and PDF documents. The application provides comprehensive visual analysis with bounding box overlays, object detection, and detailed captions.

Features

  • Multi-format Support: Upload PNG, JPG, JPEG images or PDF documents
  • PDF Processing: Automatically converts PDF pages to images for analysis
  • Florence-2 Integration: Uses the powerful Florence-2 model for:
    • Object detection with bounding boxes
    • Dense captioning
    • OCR text detection
    • Visual question answering
  • Interactive Overlays: View original and annotated versions side-by-side
  • Batch Processing: Handle multi-page PDFs efficiently
  • User-Friendly Interface: Clean Gradio interface with clear instructions

How to Use

  1. Upload a file: Choose an image (PNG/JPG/JPEG) or PDF document
  2. Select analysis type: Choose from various Florence-2 tasks
  3. View results: See original and annotated versions with overlays
  4. Download results: Save processed images with annotations

Model Information

This Space uses Microsoft's Florence-2 model, a foundation vision model that can handle various computer vision and vision-language tasks with a single model architecture.

Technical Details

  • Framework: Gradio 4.44.0
  • Model: Microsoft Florence-2 (microsoft/Florence-2-large)
  • PDF Processing: pdf2image for page-by-page conversion
  • Visualization: PIL and OpenCV for overlay rendering
  • Hardware: Optimized for CPU and GPU inference

Examples

Upload any document or image to see Florence-2 in action:

  • Documents: Analyze layouts, detect text regions, identify tables
  • Photos: Object detection, scene understanding, detailed captions
  • Screenshots: UI element detection, text extraction
  • Technical diagrams: Component identification and labeling

Florence-2 Document & Image Analyzer

This Space uses Gradio to provide an interactive interface for Microsoft's Florence-2 vision model.

Features

  • Object Detection with bounding boxes
  • Detailed image captioning
  • OCR text extraction
  • Interactive Gradio interface
  • Model caching for performance

Upload an image and select an analysis type to get started!