Spaces:

fmeres
/

florence-2-document-analyzer

Sleeping

App Files Files Community

florence-2-document-analyzer / README.md

Felipe Meres

Convert Florence-2 space from Streamlit to Gradio

1ddb064 4 months ago

preview code

raw

history blame contribute delete

2.67 kB

A newer version of the Gradio SDK is available: 6.4.0

Upgrade

metadata

title: Florence-2 Document & Image Analyzer
emoji: 📄
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
short_description: Analyze images and PDFs with Florence-2 vision model
tags:
  - computer-vision
  - florence-2
  - document-analysis
  - pdf-processing
  - image-analysis
  - object-detection

Florence-2 Document & Image Analyzer

An interactive Hugging Face Space that uses Microsoft's Florence-2 vision model to analyze uploaded images and PDF documents. The application provides comprehensive visual analysis with bounding box overlays, object detection, and detailed captions.

Features

Multi-format Support: Upload PNG, JPG, JPEG images or PDF documents
PDF Processing: Automatically converts PDF pages to images for analysis
Florence-2 Integration: Uses the powerful Florence-2 model for:
- Object detection with bounding boxes
- Dense captioning
- OCR text detection
- Visual question answering
Interactive Overlays: View original and annotated versions side-by-side
Batch Processing: Handle multi-page PDFs efficiently
User-Friendly Interface: Clean Gradio interface with clear instructions

How to Use

Upload a file: Choose an image (PNG/JPG/JPEG) or PDF document
Select analysis type: Choose from various Florence-2 tasks
View results: See original and annotated versions with overlays
Download results: Save processed images with annotations

Model Information

This Space uses Microsoft's Florence-2 model, a foundation vision model that can handle various computer vision and vision-language tasks with a single model architecture.

Technical Details

Framework: Gradio 4.44.0
Model: Microsoft Florence-2 (microsoft/Florence-2-large)
PDF Processing: pdf2image for page-by-page conversion
Visualization: PIL and OpenCV for overlay rendering
Hardware: Optimized for CPU and GPU inference

Examples

Upload any document or image to see Florence-2 in action:

Documents: Analyze layouts, detect text regions, identify tables
Photos: Object detection, scene understanding, detailed captions
Screenshots: UI element detection, text extraction
Technical diagrams: Component identification and labeling

Florence-2 Document & Image Analyzer

This Space uses Gradio to provide an interactive interface for Microsoft's Florence-2 vision model.

Features

Object Detection with bounding boxes
Detailed image captioning
OCR text extraction
Interactive Gradio interface
Model caching for performance

Upload an image and select an analysis type to get started!