Spaces:

fmeres
/

florence-2-document-analyzer

Sleeping

App Files Files Community

florence-2-document-analyzer / README.md

Felipe Meres

Convert Florence-2 space from Streamlit to Gradio

1ddb064 4 months ago

preview code

raw

history blame contribute delete

2.67 kB

	---
	title: Florence-2 Document & Image Analyzer
	emoji: 📄
	colorFrom: blue
	colorTo: purple
	sdk: gradio

	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Analyze images and PDFs with Florence-2 vision model
	tags:
	- computer-vision
	- florence-2
	- document-analysis
	- pdf-processing
	- image-analysis
	- object-detection
	---

	# Florence-2 Document & Image Analyzer

	An interactive Hugging Face Space that uses Microsoft's Florence-2 vision model to analyze uploaded images and PDF documents. The application provides comprehensive visual analysis with bounding box overlays, object detection, and detailed captions.

	## Features

	- Multi-format Support: Upload PNG, JPG, JPEG images or PDF documents
	- PDF Processing: Automatically converts PDF pages to images for analysis
	- Florence-2 Integration: Uses the powerful Florence-2 model for:
	- Object detection with bounding boxes
	- Dense captioning
	- OCR text detection
	- Visual question answering
	- Interactive Overlays: View original and annotated versions side-by-side
	- Batch Processing: Handle multi-page PDFs efficiently
	- User-Friendly Interface: Clean Gradio interface with clear instructions

	## How to Use

	1. Upload a file: Choose an image (PNG/JPG/JPEG) or PDF document
	2. Select analysis type: Choose from various Florence-2 tasks
	3. View results: See original and annotated versions with overlays
	4. Download results: Save processed images with annotations

	## Model Information

	This Space uses Microsoft's Florence-2 model, a foundation vision model that can handle various computer vision and vision-language tasks with a single model architecture.

	## Technical Details

	- Framework: Gradio 4.44.0
	- Model: Microsoft Florence-2 (microsoft/Florence-2-large)
	- PDF Processing: pdf2image for page-by-page conversion
	- Visualization: PIL and OpenCV for overlay rendering
	- Hardware: Optimized for CPU and GPU inference

	## Examples

	Upload any document or image to see Florence-2 in action:
	- Documents: Analyze layouts, detect text regions, identify tables
	- Photos: Object detection, scene understanding, detailed captions
	- Screenshots: UI element detection, text extraction
	- Technical diagrams: Component identification and labeling
	# Florence-2 Document & Image Analyzer

	This Space uses Gradio to provide an interactive interface for Microsoft's Florence-2 vision model.

	## Features
	- Object Detection with bounding boxes
	- Detailed image captioning
	- OCR text extraction
	- Interactive Gradio interface
	- Model caching for performance

	Upload an image and select an analysis type to get started!