Spaces:
Sleeping
Sleeping
Commit ·
20e8d5d
0
Parent(s):
Initial commit: Florence-2 Document & Image Analyzer Space
Browse filesFeatures:
- Multi-format support (PNG, JPG, PDF)
- Florence-2 model integration
- Object detection with bounding boxes
- OCR text extraction
- Dense captioning and detailed descriptions
- Interactive Gradio interface
- PDF page-by-page processing
- Visual overlay annotations
- .gitignore +55 -0
- README.md +63 -0
- USAGE.md +168 -0
- app.py +387 -0
- config.py +65 -0
- deploy.py +174 -0
- examples.py +316 -0
- packages.txt +3 -0
- requirements.txt +26 -0
- test_app.py +83 -0
.gitignore
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Python
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.py[cod]
|
| 4 |
+
*$py.class
|
| 5 |
+
*.so
|
| 6 |
+
.Python
|
| 7 |
+
build/
|
| 8 |
+
develop-eggs/
|
| 9 |
+
dist/
|
| 10 |
+
downloads/
|
| 11 |
+
eggs/
|
| 12 |
+
.eggs/
|
| 13 |
+
lib/
|
| 14 |
+
lib64/
|
| 15 |
+
parts/
|
| 16 |
+
sdist/
|
| 17 |
+
var/
|
| 18 |
+
wheels/
|
| 19 |
+
*.egg-info/
|
| 20 |
+
.installed.cfg
|
| 21 |
+
*.egg
|
| 22 |
+
MANIFEST
|
| 23 |
+
|
| 24 |
+
# Virtual environments
|
| 25 |
+
venv/
|
| 26 |
+
env/
|
| 27 |
+
ENV/
|
| 28 |
+
|
| 29 |
+
# IDE
|
| 30 |
+
.vscode/
|
| 31 |
+
.idea/
|
| 32 |
+
*.swp
|
| 33 |
+
*.swo
|
| 34 |
+
|
| 35 |
+
# OS
|
| 36 |
+
.DS_Store
|
| 37 |
+
Thumbs.db
|
| 38 |
+
|
| 39 |
+
# Temporary files
|
| 40 |
+
*.tmp
|
| 41 |
+
*.temp
|
| 42 |
+
temp/
|
| 43 |
+
tmp/
|
| 44 |
+
|
| 45 |
+
# Model cache (Hugging Face)
|
| 46 |
+
.cache/
|
| 47 |
+
models/
|
| 48 |
+
|
| 49 |
+
# Logs
|
| 50 |
+
*.log
|
| 51 |
+
logs/
|
| 52 |
+
|
| 53 |
+
# Gradio temporary files
|
| 54 |
+
flagged/
|
| 55 |
+
gradio_cached_examples/
|
README.md
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Florence-2 Document & Image Analyzer
|
| 3 |
+
emoji: 📄
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: apache-2.0
|
| 11 |
+
short_description: Analyze images and PDFs with Florence-2 vision model
|
| 12 |
+
tags:
|
| 13 |
+
- computer-vision
|
| 14 |
+
- florence-2
|
| 15 |
+
- document-analysis
|
| 16 |
+
- pdf-processing
|
| 17 |
+
- image-analysis
|
| 18 |
+
- object-detection
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
# Florence-2 Document & Image Analyzer
|
| 22 |
+
|
| 23 |
+
An interactive Hugging Face Space that uses Microsoft's Florence-2 vision model to analyze uploaded images and PDF documents. The application provides comprehensive visual analysis with bounding box overlays, object detection, and detailed captions.
|
| 24 |
+
|
| 25 |
+
## Features
|
| 26 |
+
|
| 27 |
+
- **Multi-format Support**: Upload PNG, JPG, JPEG images or PDF documents
|
| 28 |
+
- **PDF Processing**: Automatically converts PDF pages to images for analysis
|
| 29 |
+
- **Florence-2 Integration**: Uses the powerful Florence-2 model for:
|
| 30 |
+
- Object detection with bounding boxes
|
| 31 |
+
- Dense captioning
|
| 32 |
+
- OCR text detection
|
| 33 |
+
- Visual question answering
|
| 34 |
+
- **Interactive Overlays**: View original and annotated versions side-by-side
|
| 35 |
+
- **Batch Processing**: Handle multi-page PDFs efficiently
|
| 36 |
+
- **User-Friendly Interface**: Clean Gradio interface with clear instructions
|
| 37 |
+
|
| 38 |
+
## How to Use
|
| 39 |
+
|
| 40 |
+
1. **Upload a file**: Choose an image (PNG/JPG/JPEG) or PDF document
|
| 41 |
+
2. **Select analysis type**: Choose from various Florence-2 tasks
|
| 42 |
+
3. **View results**: See original and annotated versions with overlays
|
| 43 |
+
4. **Download results**: Save processed images with annotations
|
| 44 |
+
|
| 45 |
+
## Model Information
|
| 46 |
+
|
| 47 |
+
This Space uses Microsoft's Florence-2 model, a foundation vision model that can handle various computer vision and vision-language tasks with a single model architecture.
|
| 48 |
+
|
| 49 |
+
## Technical Details
|
| 50 |
+
|
| 51 |
+
- **Framework**: Gradio 4.44.0
|
| 52 |
+
- **Model**: Microsoft Florence-2 (microsoft/Florence-2-large)
|
| 53 |
+
- **PDF Processing**: pdf2image for page-by-page conversion
|
| 54 |
+
- **Visualization**: PIL and OpenCV for overlay rendering
|
| 55 |
+
- **Hardware**: Optimized for CPU and GPU inference
|
| 56 |
+
|
| 57 |
+
## Examples
|
| 58 |
+
|
| 59 |
+
Upload any document or image to see Florence-2 in action:
|
| 60 |
+
- **Documents**: Analyze layouts, detect text regions, identify tables
|
| 61 |
+
- **Photos**: Object detection, scene understanding, detailed captions
|
| 62 |
+
- **Screenshots**: UI element detection, text extraction
|
| 63 |
+
- **Technical diagrams**: Component identification and labeling
|
USAGE.md
ADDED
|
@@ -0,0 +1,168 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Usage Guide: Florence-2 Document & Image Analyzer
|
| 2 |
+
|
| 3 |
+
## Quick Start
|
| 4 |
+
|
| 5 |
+
1. **Launch the Space**: Open the Hugging Face Space URL
|
| 6 |
+
2. **Upload a file**: Click "Upload Image or PDF" and select your file
|
| 7 |
+
3. **Choose analysis type**: Select from the dropdown menu
|
| 8 |
+
4. **Analyze**: Click the "🔍 Analyze" button
|
| 9 |
+
5. **View results**: See original and annotated images side by side
|
| 10 |
+
|
| 11 |
+
## Analysis Types
|
| 12 |
+
|
| 13 |
+
### 📝 Detailed Caption
|
| 14 |
+
- **Purpose**: Generate comprehensive descriptions of image content
|
| 15 |
+
- **Best for**: Understanding overall scene content, accessibility descriptions
|
| 16 |
+
- **Output**: Detailed text descriptions overlaid on images
|
| 17 |
+
|
| 18 |
+
### 🎯 Object Detection
|
| 19 |
+
- **Purpose**: Identify and locate objects with bounding boxes
|
| 20 |
+
- **Best for**: Inventory analysis, object counting, spatial understanding
|
| 21 |
+
- **Output**: Bounding boxes around detected objects with labels
|
| 22 |
+
|
| 23 |
+
### 🔍 Dense Captioning
|
| 24 |
+
- **Purpose**: Provide detailed captions for different regions
|
| 25 |
+
- **Best for**: Complex scenes with multiple elements
|
| 26 |
+
- **Output**: Multiple captions for different image regions
|
| 27 |
+
|
| 28 |
+
### 📄 OCR Text Detection
|
| 29 |
+
- **Purpose**: Extract and locate text in images
|
| 30 |
+
- **Best for**: Document analysis, sign reading, text extraction
|
| 31 |
+
- **Output**: Bounding boxes around text with extracted content
|
| 32 |
+
|
| 33 |
+
### 🎪 Region Proposal
|
| 34 |
+
- **Purpose**: Identify interesting or important regions
|
| 35 |
+
- **Best for**: Finding areas of focus, preliminary analysis
|
| 36 |
+
- **Output**: Highlighted regions of interest
|
| 37 |
+
|
| 38 |
+
## Supported File Types
|
| 39 |
+
|
| 40 |
+
### Images
|
| 41 |
+
- **PNG**: High-quality images with transparency support
|
| 42 |
+
- **JPG/JPEG**: Standard photo formats
|
| 43 |
+
- **BMP**: Bitmap images
|
| 44 |
+
- **TIFF**: High-quality document scans
|
| 45 |
+
|
| 46 |
+
### Documents
|
| 47 |
+
- **PDF**: Multi-page documents (converted to images automatically)
|
| 48 |
+
- Maximum pages: 20 (configurable)
|
| 49 |
+
- Resolution: 200 DPI
|
| 50 |
+
- All pages processed individually
|
| 51 |
+
|
| 52 |
+
## Tips for Best Results
|
| 53 |
+
|
| 54 |
+
### Image Quality
|
| 55 |
+
- Use high-resolution images (recommended: at least 800x600)
|
| 56 |
+
- Ensure good lighting and contrast
|
| 57 |
+
- Avoid heavily compressed or blurry images
|
| 58 |
+
- Clear, unobstructed view of subjects works best
|
| 59 |
+
|
| 60 |
+
### PDF Documents
|
| 61 |
+
- Scan documents at 200+ DPI for better text recognition
|
| 62 |
+
- Ensure pages are properly oriented
|
| 63 |
+
- Single-column layouts work better than complex multi-column designs
|
| 64 |
+
- Consider splitting very large PDFs into smaller sections
|
| 65 |
+
|
| 66 |
+
### Analysis Selection
|
| 67 |
+
- **For documents**: Start with OCR to extract text
|
| 68 |
+
- **For photos**: Try Object Detection first, then Detailed Caption
|
| 69 |
+
- **For complex scenes**: Use Dense Captioning for comprehensive analysis
|
| 70 |
+
- **For preliminary analysis**: Region Proposal can help identify areas of interest
|
| 71 |
+
|
| 72 |
+
## Understanding Results
|
| 73 |
+
|
| 74 |
+
### Gallery View
|
| 75 |
+
- **Left images**: Original uploaded content
|
| 76 |
+
- **Right images**: Annotated versions with Florence-2 analysis
|
| 77 |
+
- Images are displayed in order (Page 1, Page 2, etc. for PDFs)
|
| 78 |
+
|
| 79 |
+
### Status Panel
|
| 80 |
+
- Real-time processing updates
|
| 81 |
+
- Error messages and troubleshooting info
|
| 82 |
+
- Summary of detected objects/text
|
| 83 |
+
- Processing time and page counts
|
| 84 |
+
|
| 85 |
+
### Annotations
|
| 86 |
+
- **Bounding boxes**: Colored rectangles around detected elements
|
| 87 |
+
- **Labels**: Text descriptions of detected objects/text
|
| 88 |
+
- **Colors**: Different colors distinguish between different objects
|
| 89 |
+
- **Coordinates**: Boxes positioned accurately on original image coordinates
|
| 90 |
+
|
| 91 |
+
## Common Use Cases
|
| 92 |
+
|
| 93 |
+
### 📋 Document Analysis
|
| 94 |
+
1. Upload scanned documents or PDFs
|
| 95 |
+
2. Use OCR to extract all text content
|
| 96 |
+
3. Use Object Detection to identify tables, figures, signatures
|
| 97 |
+
4. Review extracted information in the status panel
|
| 98 |
+
|
| 99 |
+
### 📸 Photo Analysis
|
| 100 |
+
1. Upload photos of scenes, objects, or people
|
| 101 |
+
2. Use Object Detection to identify all visible objects
|
| 102 |
+
3. Use Detailed Caption for comprehensive scene description
|
| 103 |
+
4. Compare original and annotated versions
|
| 104 |
+
|
| 105 |
+
### 🏢 Technical Diagrams
|
| 106 |
+
1. Upload engineering drawings, flowcharts, or schematics
|
| 107 |
+
2. Use Region Proposal to identify key components
|
| 108 |
+
3. Use Dense Captioning for detailed component descriptions
|
| 109 |
+
4. Extract text labels with OCR
|
| 110 |
+
|
| 111 |
+
### 📊 Data Visualization
|
| 112 |
+
1. Upload charts, graphs, or infographics
|
| 113 |
+
2. Use Object Detection to identify chart elements
|
| 114 |
+
3. Use OCR to extract data labels and values
|
| 115 |
+
4. Use Detailed Caption for overall chart description
|
| 116 |
+
|
| 117 |
+
## Troubleshooting
|
| 118 |
+
|
| 119 |
+
### Model Loading Issues
|
| 120 |
+
- **First run may be slow**: Florence-2 model downloads automatically (several GB)
|
| 121 |
+
- **Memory errors**: Try using smaller images or fewer PDF pages
|
| 122 |
+
- **Timeout errors**: Large files may need multiple attempts
|
| 123 |
+
|
| 124 |
+
### Processing Failures
|
| 125 |
+
- **Unsupported formats**: Convert to PNG/JPG/PDF first
|
| 126 |
+
- **Large files**: Resize images or split PDFs into smaller sections
|
| 127 |
+
- **Poor quality**: Use higher resolution scans or clearer photos
|
| 128 |
+
|
| 129 |
+
### Performance Tips
|
| 130 |
+
- **GPU acceleration**: Automatic if available, significantly faster processing
|
| 131 |
+
- **Batch processing**: Process multiple pages efficiently
|
| 132 |
+
- **Image optimization**: Resize very large images for faster processing
|
| 133 |
+
|
| 134 |
+
## Privacy and Security
|
| 135 |
+
|
| 136 |
+
- **No data storage**: Files are processed in memory only
|
| 137 |
+
- **Temporary processing**: Uploaded files are not permanently saved
|
| 138 |
+
- **Local processing**: All analysis happens on Hugging Face infrastructure
|
| 139 |
+
- **No external API calls**: Florence-2 runs locally within the Space
|
| 140 |
+
|
| 141 |
+
## Advanced Features
|
| 142 |
+
|
| 143 |
+
### Custom Configuration
|
| 144 |
+
- Model parameters can be adjusted in `config.py`
|
| 145 |
+
- Different Florence-2 model variants available
|
| 146 |
+
- Processing limits configurable for different deployment scenarios
|
| 147 |
+
|
| 148 |
+
### API Integration
|
| 149 |
+
- Space can be used via Gradio API for programmatic access
|
| 150 |
+
- Batch processing support for multiple files
|
| 151 |
+
- JSON output available for automated workflows
|
| 152 |
+
|
| 153 |
+
## Getting Help
|
| 154 |
+
|
| 155 |
+
If you encounter issues:
|
| 156 |
+
|
| 157 |
+
1. **Check file format**: Ensure you're using supported formats (PNG, JPG, PDF)
|
| 158 |
+
2. **Verify file size**: Large files may need to be resized or split
|
| 159 |
+
3. **Try different analysis types**: Some work better for specific content types
|
| 160 |
+
4. **Check status messages**: Detailed error information appears in the status panel
|
| 161 |
+
5. **Report bugs**: Use Hugging Face Space discussion tab for persistent issues
|
| 162 |
+
|
| 163 |
+
## Credits
|
| 164 |
+
|
| 165 |
+
- **Florence-2 Model**: Microsoft Research
|
| 166 |
+
- **Interface**: Built with Gradio
|
| 167 |
+
- **PDF Processing**: pdf2image library
|
| 168 |
+
- **Deployment**: Hugging Face Spaces
|
app.py
ADDED
|
@@ -0,0 +1,387 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import torch
|
| 3 |
+
from PIL import Image, ImageDraw, ImageFont
|
| 4 |
+
import numpy as np
|
| 5 |
+
import io
|
| 6 |
+
import base64
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
import tempfile
|
| 9 |
+
import os
|
| 10 |
+
from typing import List, Tuple, Dict, Any, Optional
|
| 11 |
+
import json
|
| 12 |
+
import time
|
| 13 |
+
|
| 14 |
+
# Import configuration
|
| 15 |
+
from config import *
|
| 16 |
+
|
| 17 |
+
# PDF processing
|
| 18 |
+
try:
|
| 19 |
+
from pdf2image import convert_from_path, convert_from_bytes
|
| 20 |
+
PDF_AVAILABLE = True
|
| 21 |
+
except ImportError:
|
| 22 |
+
PDF_AVAILABLE = False
|
| 23 |
+
print("Warning: pdf2image not available. PDF processing will be disabled.")
|
| 24 |
+
|
| 25 |
+
# Florence-2 model imports
|
| 26 |
+
try:
|
| 27 |
+
from transformers import AutoProcessor, AutoModelForCausalLM
|
| 28 |
+
FLORENCE_AVAILABLE = True
|
| 29 |
+
except ImportError:
|
| 30 |
+
FLORENCE_AVAILABLE = False
|
| 31 |
+
print("Warning: transformers not available. Florence-2 processing will be disabled.")
|
| 32 |
+
|
| 33 |
+
class Florence2Analyzer:
|
| 34 |
+
def __init__(self):
|
| 35 |
+
self.model = None
|
| 36 |
+
self.processor = None
|
| 37 |
+
self.device = "cpu" if FORCE_CPU else ("cuda" if torch.cuda.is_available() else "cpu")
|
| 38 |
+
self._load_model()
|
| 39 |
+
|
| 40 |
+
def _load_model(self):
|
| 41 |
+
"""Load Florence-2 model and processor"""
|
| 42 |
+
if not FLORENCE_AVAILABLE:
|
| 43 |
+
print("Florence-2 not available - transformers library not found")
|
| 44 |
+
return
|
| 45 |
+
|
| 46 |
+
try:
|
| 47 |
+
print(f"Loading Florence-2 model: {FLORENCE_MODEL_ID}")
|
| 48 |
+
start_time = time.time()
|
| 49 |
+
|
| 50 |
+
self.model = AutoModelForCausalLM.from_pretrained(
|
| 51 |
+
FLORENCE_MODEL_ID,
|
| 52 |
+
torch_dtype=torch.float16 if (torch.cuda.is_available() and not FORCE_CPU) else torch.float32,
|
| 53 |
+
trust_remote_code=True
|
| 54 |
+
).to(self.device)
|
| 55 |
+
|
| 56 |
+
self.processor = AutoProcessor.from_pretrained(FLORENCE_MODEL_ID, trust_remote_code=True)
|
| 57 |
+
|
| 58 |
+
load_time = time.time() - start_time
|
| 59 |
+
print(f"Florence-2 model loaded successfully on {self.device} in {load_time:.2f} seconds")
|
| 60 |
+
except Exception as e:
|
| 61 |
+
print(f"Error loading Florence-2 model: {e}")
|
| 62 |
+
self.model = None
|
| 63 |
+
self.processor = None
|
| 64 |
+
|
| 65 |
+
def analyze_image(self, image: Image.Image, task_type: str = "detailed_caption") -> Dict[str, Any]:
|
| 66 |
+
"""Analyze image with Florence-2 model"""
|
| 67 |
+
if not self.model or not self.processor:
|
| 68 |
+
return {"error": ERROR_MESSAGES["model_not_loaded"], "success": False}
|
| 69 |
+
|
| 70 |
+
try:
|
| 71 |
+
# Get task configuration
|
| 72 |
+
task_config = FLORENCE_TASKS.get(task_type, FLORENCE_TASKS["detailed_caption"])
|
| 73 |
+
task_prompt = task_config["prompt"]
|
| 74 |
+
|
| 75 |
+
# Resize image if too large
|
| 76 |
+
if image.size[0] > MAX_IMAGE_SIZE[0] or image.size[1] > MAX_IMAGE_SIZE[1]:
|
| 77 |
+
image.thumbnail(MAX_IMAGE_SIZE, Image.Resampling.LANCZOS)
|
| 78 |
+
print(f"Resized image to {image.size}")
|
| 79 |
+
|
| 80 |
+
# Process image
|
| 81 |
+
inputs = self.processor(text=task_prompt, images=image, return_tensors="pt").to(self.device)
|
| 82 |
+
|
| 83 |
+
# Generate
|
| 84 |
+
generated_ids = self.model.generate(
|
| 85 |
+
input_ids=inputs["input_ids"],
|
| 86 |
+
pixel_values=inputs["pixel_values"],
|
| 87 |
+
max_new_tokens=task_config["max_tokens"],
|
| 88 |
+
num_beams=3,
|
| 89 |
+
do_sample=False
|
| 90 |
+
)
|
| 91 |
+
|
| 92 |
+
# Decode response
|
| 93 |
+
generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
|
| 94 |
+
parsed_answer = self.processor.post_process_generation(
|
| 95 |
+
generated_text,
|
| 96 |
+
task=task_prompt,
|
| 97 |
+
image_size=(image.width, image.height)
|
| 98 |
+
)
|
| 99 |
+
|
| 100 |
+
return {
|
| 101 |
+
"task_type": task_type,
|
| 102 |
+
"raw_text": generated_text,
|
| 103 |
+
"parsed_results": parsed_answer,
|
| 104 |
+
"success": True,
|
| 105 |
+
"processing_time": time.time()
|
| 106 |
+
}
|
| 107 |
+
|
| 108 |
+
except Exception as e:
|
| 109 |
+
return {"error": f"Analysis failed: {str(e)}", "success": False}
|
| 110 |
+
|
| 111 |
+
def convert_pdf_to_images(pdf_file) -> List[Image.Image]:
|
| 112 |
+
"""Convert PDF pages to PIL Images"""
|
| 113 |
+
if not PDF_AVAILABLE:
|
| 114 |
+
raise ValueError("PDF processing not available. Please install pdf2image.")
|
| 115 |
+
|
| 116 |
+
try:
|
| 117 |
+
# Handle different input types
|
| 118 |
+
if hasattr(pdf_file, 'read'):
|
| 119 |
+
# File-like object
|
| 120 |
+
pdf_bytes = pdf_file.read()
|
| 121 |
+
images = convert_from_bytes(pdf_bytes, dpi=PDF_DPI, fmt='RGB')
|
| 122 |
+
elif isinstance(pdf_file, str) and os.path.exists(pdf_file):
|
| 123 |
+
# File path
|
| 124 |
+
images = convert_from_path(pdf_file, dpi=PDF_DPI, fmt='RGB')
|
| 125 |
+
else:
|
| 126 |
+
raise ValueError("Invalid PDF input format")
|
| 127 |
+
|
| 128 |
+
# Limit number of pages
|
| 129 |
+
if len(images) > MAX_PDF_PAGES:
|
| 130 |
+
print(f"Warning: PDF has {len(images)} pages, processing only first {MAX_PDF_PAGES}")
|
| 131 |
+
images = images[:MAX_PDF_PAGES]
|
| 132 |
+
|
| 133 |
+
return images
|
| 134 |
+
except Exception as e:
|
| 135 |
+
raise ValueError(f"Failed to convert PDF: {str(e)}")
|
| 136 |
+
|
| 137 |
+
def draw_bounding_boxes(image: Image.Image, results: Dict[str, Any]) -> Image.Image:
|
| 138 |
+
"""Draw bounding boxes and labels on image"""
|
| 139 |
+
if not results.get("success", False):
|
| 140 |
+
return image
|
| 141 |
+
|
| 142 |
+
# Create a copy to draw on
|
| 143 |
+
annotated_image = image.copy()
|
| 144 |
+
draw = ImageDraw.Draw(annotated_image)
|
| 145 |
+
|
| 146 |
+
try:
|
| 147 |
+
# Load a font
|
| 148 |
+
try:
|
| 149 |
+
font = ImageFont.truetype("arial.ttf", FONT_SIZE)
|
| 150 |
+
except:
|
| 151 |
+
try:
|
| 152 |
+
font = ImageFont.truetype("DejaVuSans.ttf", FONT_SIZE)
|
| 153 |
+
except:
|
| 154 |
+
font = ImageFont.load_default()
|
| 155 |
+
|
| 156 |
+
parsed_results = results.get("parsed_results", {})
|
| 157 |
+
|
| 158 |
+
# Handle object detection and dense captioning results
|
| 159 |
+
if "bboxes" in parsed_results and "labels" in parsed_results:
|
| 160 |
+
bboxes = parsed_results["bboxes"]
|
| 161 |
+
labels = parsed_results["labels"]
|
| 162 |
+
|
| 163 |
+
for i, (bbox, label) in enumerate(zip(bboxes, labels)):
|
| 164 |
+
color = BBOX_COLORS[i % len(BBOX_COLORS)]
|
| 165 |
+
x1, y1, x2, y2 = bbox
|
| 166 |
+
|
| 167 |
+
# Draw bounding box
|
| 168 |
+
draw.rectangle([x1, y1, x2, y2], outline=color, width=BBOX_WIDTH)
|
| 169 |
+
|
| 170 |
+
# Prepare label text (truncate if too long)
|
| 171 |
+
display_label = label if len(label) <= 30 else f"{label[:27]}..."
|
| 172 |
+
|
| 173 |
+
# Draw label background
|
| 174 |
+
text_bbox = draw.textbbox((x1, y1), display_label, font=font)
|
| 175 |
+
text_width = text_bbox[2] - text_bbox[0]
|
| 176 |
+
text_height = text_bbox[3] - text_bbox[1]
|
| 177 |
+
|
| 178 |
+
# Ensure label fits within image bounds
|
| 179 |
+
label_x = min(x1, image.width - text_width - 5)
|
| 180 |
+
label_y = max(y1 - text_height - 5, 5)
|
| 181 |
+
|
| 182 |
+
# Draw background rectangle
|
| 183 |
+
draw.rectangle([label_x - 2, label_y - 2, label_x + text_width + 2, label_y + text_height + 2],
|
| 184 |
+
fill=color)
|
| 185 |
+
|
| 186 |
+
# Draw label text
|
| 187 |
+
draw.text((label_x, label_y), display_label, fill="white", font=font)
|
| 188 |
+
|
| 189 |
+
# Handle OCR results
|
| 190 |
+
elif "quad_boxes" in parsed_results and "labels" in parsed_results:
|
| 191 |
+
quad_boxes = parsed_results["quad_boxes"]
|
| 192 |
+
labels = parsed_results["labels"]
|
| 193 |
+
|
| 194 |
+
for i, (quad, label) in enumerate(zip(quad_boxes, labels)):
|
| 195 |
+
color = BBOX_COLORS[i % len(BBOX_COLORS)]
|
| 196 |
+
|
| 197 |
+
# Draw quadrilateral for OCR results
|
| 198 |
+
if len(quad) >= 8: # quad should have 8 coordinates (4 points)
|
| 199 |
+
points = [(quad[j], quad[j+1]) for j in range(0, 8, 2)]
|
| 200 |
+
draw.polygon(points, outline=color, width=BBOX_WIDTH)
|
| 201 |
+
|
| 202 |
+
# Draw label near first point
|
| 203 |
+
x, y = points[0]
|
| 204 |
+
display_label = label if len(label) <= 20 else f"{label[:17]}..."
|
| 205 |
+
|
| 206 |
+
text_bbox = draw.textbbox((x, y), display_label, font=font)
|
| 207 |
+
draw.rectangle([text_bbox[0]-2, text_bbox[1]-2, text_bbox[2]+2, text_bbox[3]+2],
|
| 208 |
+
fill=color)
|
| 209 |
+
draw.text((x, y), display_label, fill="white", font=font)
|
| 210 |
+
|
| 211 |
+
except Exception as e:
|
| 212 |
+
print(f"Error drawing annotations: {e}")
|
| 213 |
+
|
| 214 |
+
return annotated_image
|
| 215 |
+
|
| 216 |
+
def process_uploaded_file(file, task_type: str) -> Tuple[List[Image.Image], List[Image.Image], str]:
|
| 217 |
+
"""Process uploaded file (image or PDF) and return original and annotated versions"""
|
| 218 |
+
if file is None:
|
| 219 |
+
return [], [], "No file uploaded."
|
| 220 |
+
|
| 221 |
+
analyzer = Florence2Analyzer()
|
| 222 |
+
original_images = []
|
| 223 |
+
annotated_images = []
|
| 224 |
+
status_message = ""
|
| 225 |
+
|
| 226 |
+
try:
|
| 227 |
+
# Determine file type
|
| 228 |
+
file_extension = Path(file.name).suffix.lower()
|
| 229 |
+
|
| 230 |
+
if file_extension == '.pdf':
|
| 231 |
+
if not PDF_AVAILABLE:
|
| 232 |
+
return [], [], "PDF processing not available. Please install pdf2image."
|
| 233 |
+
|
| 234 |
+
# Convert PDF to images
|
| 235 |
+
status_message += f"Converting PDF to images...\n"
|
| 236 |
+
pdf_images = convert_pdf_to_images(file)
|
| 237 |
+
status_message += f"Successfully converted {len(pdf_images)} pages.\n"
|
| 238 |
+
|
| 239 |
+
for i, img in enumerate(pdf_images):
|
| 240 |
+
status_message += f"Processing page {i+1}...\n"
|
| 241 |
+
|
| 242 |
+
# Analyze with Florence-2
|
| 243 |
+
results = analyzer.analyze_image(img, task_type)
|
| 244 |
+
|
| 245 |
+
if results.get("success", False):
|
| 246 |
+
annotated_img = draw_bounding_boxes(img, results)
|
| 247 |
+
original_images.append(img)
|
| 248 |
+
annotated_images.append(annotated_img)
|
| 249 |
+
status_message += f"Page {i+1} analyzed successfully.\n"
|
| 250 |
+
else:
|
| 251 |
+
status_message += f"Page {i+1} analysis failed: {results.get('error', 'Unknown error')}\n"
|
| 252 |
+
original_images.append(img)
|
| 253 |
+
annotated_images.append(img) # Fallback to original
|
| 254 |
+
|
| 255 |
+
elif file_extension in ['.png', '.jpg', '.jpeg', '.bmp', '.tiff']:
|
| 256 |
+
# Process single image
|
| 257 |
+
status_message += "Processing image...\n"
|
| 258 |
+
|
| 259 |
+
img = Image.open(file).convert('RGB')
|
| 260 |
+
results = analyzer.analyze_image(img, task_type)
|
| 261 |
+
|
| 262 |
+
if results.get("success", False):
|
| 263 |
+
annotated_img = draw_bounding_boxes(img, results)
|
| 264 |
+
original_images.append(img)
|
| 265 |
+
annotated_images.append(annotated_img)
|
| 266 |
+
status_message += "Image analyzed successfully.\n"
|
| 267 |
+
|
| 268 |
+
# Add detailed results to status
|
| 269 |
+
if "parsed_results" in results:
|
| 270 |
+
parsed = results["parsed_results"]
|
| 271 |
+
if task_type == "detailed_caption" and isinstance(parsed, dict):
|
| 272 |
+
caption = parsed.get("detailed_caption", "No caption generated")
|
| 273 |
+
status_message += f"Caption: {caption}\n"
|
| 274 |
+
elif "labels" in parsed:
|
| 275 |
+
labels = parsed["labels"]
|
| 276 |
+
status_message += f"Detected objects: {', '.join(labels[:5])}{'...' if len(labels) > 5 else ''}\n"
|
| 277 |
+
else:
|
| 278 |
+
status_message += f"Analysis failed: {results.get('error', 'Unknown error')}\n"
|
| 279 |
+
original_images.append(img)
|
| 280 |
+
annotated_images.append(img)
|
| 281 |
+
else:
|
| 282 |
+
return [], [], f"Unsupported file type: {file_extension}. Please upload PNG, JPG, JPEG, or PDF files."
|
| 283 |
+
|
| 284 |
+
except Exception as e:
|
| 285 |
+
return [], [], f"Error processing file: {str(e)}"
|
| 286 |
+
|
| 287 |
+
return original_images, annotated_images, status_message
|
| 288 |
+
|
| 289 |
+
def create_gallery_content(original_images: List[Image.Image], annotated_images: List[Image.Image]) -> List[Tuple[Image.Image, str]]:
|
| 290 |
+
"""Create content for Gradio gallery showing both original and annotated versions"""
|
| 291 |
+
gallery_content = []
|
| 292 |
+
|
| 293 |
+
for i, (orig, anno) in enumerate(zip(original_images, annotated_images)):
|
| 294 |
+
# Add original image
|
| 295 |
+
gallery_content.append((orig, f"Page/Image {i+1} - Original"))
|
| 296 |
+
# Add annotated image
|
| 297 |
+
gallery_content.append((anno, f"Page/Image {i+1} - Analyzed"))
|
| 298 |
+
|
| 299 |
+
return gallery_content
|
| 300 |
+
|
| 301 |
+
# Create Gradio interface
|
| 302 |
+
def create_interface():
|
| 303 |
+
with gr.Blocks(title="Florence-2 Document & Image Analyzer", theme=gr.themes.Soft()) as demo:
|
| 304 |
+
gr.Markdown("""
|
| 305 |
+
# 📄 Florence-2 Document & Image Analyzer
|
| 306 |
+
|
| 307 |
+
Upload images (PNG, JPG, JPEG) or PDF documents to analyze them with Microsoft's Florence-2 vision model.
|
| 308 |
+
The model can detect objects, generate captions, perform OCR, and more!
|
| 309 |
+
""")
|
| 310 |
+
|
| 311 |
+
with gr.Row():
|
| 312 |
+
with gr.Column(scale=1):
|
| 313 |
+
file_upload = gr.File(
|
| 314 |
+
label="Upload Image or PDF",
|
| 315 |
+
file_types=[".png", ".jpg", ".jpeg", ".pdf"],
|
| 316 |
+
type="filepath"
|
| 317 |
+
)
|
| 318 |
+
|
| 319 |
+
task_type = gr.Dropdown(
|
| 320 |
+
choices=[(config["description"], task_name) for task_name, config in FLORENCE_TASKS.items()],
|
| 321 |
+
value="object_detection",
|
| 322 |
+
label="Analysis Type",
|
| 323 |
+
info="Choose what type of analysis to perform"
|
| 324 |
+
)
|
| 325 |
+
|
| 326 |
+
analyze_btn = gr.Button("🔍 Analyze", variant="primary")
|
| 327 |
+
|
| 328 |
+
status_text = gr.Textbox(
|
| 329 |
+
label="Status",
|
| 330 |
+
lines=8,
|
| 331 |
+
interactive=False,
|
| 332 |
+
placeholder="Upload a file and click Analyze to see results..."
|
| 333 |
+
)
|
| 334 |
+
|
| 335 |
+
with gr.Column(scale=2):
|
| 336 |
+
gallery = gr.Gallery(
|
| 337 |
+
label="Results (Original vs Analyzed)",
|
| 338 |
+
show_label=True,
|
| 339 |
+
elem_id="gallery",
|
| 340 |
+
columns=2,
|
| 341 |
+
rows=2,
|
| 342 |
+
object_fit="contain",
|
| 343 |
+
height="auto"
|
| 344 |
+
)
|
| 345 |
+
|
| 346 |
+
# Event handler
|
| 347 |
+
def process_and_display(file, task):
|
| 348 |
+
if file is None:
|
| 349 |
+
return [], "Please upload a file first."
|
| 350 |
+
|
| 351 |
+
original_imgs, annotated_imgs, status = process_uploaded_file(file, task)
|
| 352 |
+
gallery_content = create_gallery_content(original_imgs, annotated_imgs)
|
| 353 |
+
|
| 354 |
+
return gallery_content, status
|
| 355 |
+
|
| 356 |
+
analyze_btn.click(
|
| 357 |
+
fn=process_and_display,
|
| 358 |
+
inputs=[file_upload, task_type],
|
| 359 |
+
outputs=[gallery, status_text]
|
| 360 |
+
)
|
| 361 |
+
|
| 362 |
+
# Example section
|
| 363 |
+
gr.Markdown("""
|
| 364 |
+
## 💡 Tips for Best Results
|
| 365 |
+
|
| 366 |
+
- **Images**: Upload clear, high-resolution images for better analysis
|
| 367 |
+
- **PDFs**: Multi-page PDFs will be processed page by page
|
| 368 |
+
- **Object Detection**: Great for identifying and locating objects in images
|
| 369 |
+
- **Detailed Caption**: Provides comprehensive descriptions of image content
|
| 370 |
+
- **OCR**: Perfect for extracting text from documents and images
|
| 371 |
+
- **Dense Captioning**: Provides detailed captions for different regions
|
| 372 |
+
|
| 373 |
+
## 🎯 Supported Formats
|
| 374 |
+
- **Images**: PNG, JPG, JPEG, BMP, TIFF
|
| 375 |
+
- **Documents**: PDF (converted to images automatically)
|
| 376 |
+
""")
|
| 377 |
+
|
| 378 |
+
return demo
|
| 379 |
+
|
| 380 |
+
# Launch the application
|
| 381 |
+
if __name__ == "__main__":
|
| 382 |
+
demo = create_interface()
|
| 383 |
+
demo.launch(
|
| 384 |
+
share=SHARE_LINK,
|
| 385 |
+
server_port=SERVER_PORT,
|
| 386 |
+
show_error=True
|
| 387 |
+
)
|
config.py
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Configuration settings for Florence-2 Document & Image Analyzer
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
# Model configuration
|
| 6 |
+
FLORENCE_MODEL_ID = "microsoft/Florence-2-large"
|
| 7 |
+
|
| 8 |
+
# Alternative models (comment/uncomment as needed)
|
| 9 |
+
# FLORENCE_MODEL_ID = "microsoft/Florence-2-base" # Smaller, faster model
|
| 10 |
+
|
| 11 |
+
# Processing configuration
|
| 12 |
+
MAX_PDF_PAGES = 20 # Maximum number of PDF pages to process
|
| 13 |
+
PDF_DPI = 200 # DPI for PDF to image conversion
|
| 14 |
+
MAX_IMAGE_SIZE = (1920, 1920) # Maximum image dimensions
|
| 15 |
+
|
| 16 |
+
# Gradio configuration
|
| 17 |
+
GRADIO_THEME = "soft" # Options: default, soft, monochrome, etc.
|
| 18 |
+
SHARE_LINK = True # Create public share link
|
| 19 |
+
SERVER_PORT = 7860 # Default Gradio port
|
| 20 |
+
|
| 21 |
+
# Device configuration
|
| 22 |
+
FORCE_CPU = False # Set to True to force CPU usage even if GPU available
|
| 23 |
+
|
| 24 |
+
# Visualization configuration
|
| 25 |
+
BBOX_COLORS = ["red", "blue", "green", "orange", "purple", "yellow", "pink", "cyan"]
|
| 26 |
+
BBOX_WIDTH = 2
|
| 27 |
+
FONT_SIZE = 12
|
| 28 |
+
|
| 29 |
+
# Task configurations
|
| 30 |
+
FLORENCE_TASKS = {
|
| 31 |
+
"detailed_caption": {
|
| 32 |
+
"prompt": "<MORE_DETAILED_CAPTION>",
|
| 33 |
+
"description": "Generate detailed descriptions of the image content",
|
| 34 |
+
"max_tokens": 1024
|
| 35 |
+
},
|
| 36 |
+
"object_detection": {
|
| 37 |
+
"prompt": "<OD>",
|
| 38 |
+
"description": "Detect and locate objects with bounding boxes",
|
| 39 |
+
"max_tokens": 512
|
| 40 |
+
},
|
| 41 |
+
"dense_captioning": {
|
| 42 |
+
"prompt": "<DENSE_REGION_CAPTION>",
|
| 43 |
+
"description": "Provide captions for different regions in the image",
|
| 44 |
+
"max_tokens": 1024
|
| 45 |
+
},
|
| 46 |
+
"ocr": {
|
| 47 |
+
"prompt": "<OCR>",
|
| 48 |
+
"description": "Extract and locate text in the image",
|
| 49 |
+
"max_tokens": 512
|
| 50 |
+
},
|
| 51 |
+
"region_proposal": {
|
| 52 |
+
"prompt": "<REGION_PROPOSAL>",
|
| 53 |
+
"description": "Identify interesting regions in the image",
|
| 54 |
+
"max_tokens": 256
|
| 55 |
+
}
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
# Error messages
|
| 59 |
+
ERROR_MESSAGES = {
|
| 60 |
+
"model_not_loaded": "Florence-2 model is not available. Please check your internet connection and try again.",
|
| 61 |
+
"unsupported_format": "Unsupported file format. Please upload PNG, JPG, JPEG, or PDF files.",
|
| 62 |
+
"pdf_too_large": f"PDF has too many pages (max: {MAX_PDF_PAGES}). Please use a smaller document.",
|
| 63 |
+
"processing_failed": "Failed to process the file. Please try again with a different image.",
|
| 64 |
+
"no_file": "Please upload a file first.",
|
| 65 |
+
}
|
deploy.py
ADDED
|
@@ -0,0 +1,174 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Deployment script for Florence-2 Document & Image Analyzer
|
| 4 |
+
This script helps prepare and test the Hugging Face Space before deployment
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import subprocess
|
| 8 |
+
import sys
|
| 9 |
+
import os
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
|
| 12 |
+
def check_dependencies():
|
| 13 |
+
"""Check if all required dependencies are available"""
|
| 14 |
+
print("Checking dependencies...")
|
| 15 |
+
|
| 16 |
+
required_packages = [
|
| 17 |
+
"gradio",
|
| 18 |
+
"torch",
|
| 19 |
+
"transformers",
|
| 20 |
+
"Pillow",
|
| 21 |
+
"pdf2image",
|
| 22 |
+
"numpy"
|
| 23 |
+
]
|
| 24 |
+
|
| 25 |
+
missing_packages = []
|
| 26 |
+
|
| 27 |
+
for package in required_packages:
|
| 28 |
+
try:
|
| 29 |
+
__import__(package)
|
| 30 |
+
print(f" OK {package}")
|
| 31 |
+
except ImportError:
|
| 32 |
+
print(f" MISSING {package}")
|
| 33 |
+
missing_packages.append(package)
|
| 34 |
+
|
| 35 |
+
if missing_packages:
|
| 36 |
+
print(f"\nMissing packages: {', '.join(missing_packages)}")
|
| 37 |
+
print("Run: pip install -r requirements.txt")
|
| 38 |
+
return False
|
| 39 |
+
|
| 40 |
+
print("All dependencies available")
|
| 41 |
+
return True
|
| 42 |
+
|
| 43 |
+
def validate_files():
|
| 44 |
+
"""Validate that all required files are present"""
|
| 45 |
+
print("\nValidating files...")
|
| 46 |
+
|
| 47 |
+
required_files = [
|
| 48 |
+
"README.md",
|
| 49 |
+
"app.py",
|
| 50 |
+
"requirements.txt",
|
| 51 |
+
"config.py",
|
| 52 |
+
"packages.txt"
|
| 53 |
+
]
|
| 54 |
+
|
| 55 |
+
missing_files = []
|
| 56 |
+
|
| 57 |
+
for file_name in required_files:
|
| 58 |
+
if os.path.exists(file_name):
|
| 59 |
+
print(f" OK {file_name}")
|
| 60 |
+
else:
|
| 61 |
+
print(f" MISSING {file_name}")
|
| 62 |
+
missing_files.append(file_name)
|
| 63 |
+
|
| 64 |
+
if missing_files:
|
| 65 |
+
print(f"\nMissing files: {', '.join(missing_files)}")
|
| 66 |
+
return False
|
| 67 |
+
|
| 68 |
+
print("All required files present")
|
| 69 |
+
return True
|
| 70 |
+
|
| 71 |
+
def test_import():
|
| 72 |
+
"""Test importing the main application"""
|
| 73 |
+
print("\nTesting application import...")
|
| 74 |
+
|
| 75 |
+
try:
|
| 76 |
+
from app import Florence2Analyzer, create_interface
|
| 77 |
+
print("App modules imported successfully")
|
| 78 |
+
|
| 79 |
+
# Test interface creation
|
| 80 |
+
demo = create_interface()
|
| 81 |
+
print("Gradio interface created successfully")
|
| 82 |
+
|
| 83 |
+
return True
|
| 84 |
+
except Exception as e:
|
| 85 |
+
print(f"Import failed: {e}")
|
| 86 |
+
return False
|
| 87 |
+
|
| 88 |
+
def run_tests():
|
| 89 |
+
"""Run basic functionality tests"""
|
| 90 |
+
print("\nRunning basic tests...")
|
| 91 |
+
|
| 92 |
+
try:
|
| 93 |
+
# Run the test script
|
| 94 |
+
result = subprocess.run([sys.executable, "test_app.py"],
|
| 95 |
+
capture_output=True, text=True)
|
| 96 |
+
|
| 97 |
+
if result.returncode == 0:
|
| 98 |
+
print("Tests passed")
|
| 99 |
+
print(result.stdout)
|
| 100 |
+
return True
|
| 101 |
+
else:
|
| 102 |
+
print("Tests failed")
|
| 103 |
+
print(result.stderr)
|
| 104 |
+
return False
|
| 105 |
+
except Exception as e:
|
| 106 |
+
print(f"Test execution failed: {e}")
|
| 107 |
+
return False
|
| 108 |
+
|
| 109 |
+
def show_deployment_info():
|
| 110 |
+
"""Show information about deploying to Hugging Face"""
|
| 111 |
+
print("\nDeployment Information")
|
| 112 |
+
print("=" * 50)
|
| 113 |
+
|
| 114 |
+
print("\nTo deploy to Hugging Face Spaces:")
|
| 115 |
+
print("1. Create a new Space at https://huggingface.co/spaces")
|
| 116 |
+
print("2. Choose 'Gradio' as the SDK")
|
| 117 |
+
print("3. Upload or git push these files:")
|
| 118 |
+
|
| 119 |
+
files_to_upload = [
|
| 120 |
+
"README.md (Space configuration)",
|
| 121 |
+
"app.py (Main application)",
|
| 122 |
+
"requirements.txt (Python dependencies)",
|
| 123 |
+
"config.py (Configuration settings)",
|
| 124 |
+
"packages.txt (System dependencies)",
|
| 125 |
+
".gitignore (Git ignore rules)"
|
| 126 |
+
]
|
| 127 |
+
|
| 128 |
+
for file_info in files_to_upload:
|
| 129 |
+
print(f" - {file_info}")
|
| 130 |
+
|
| 131 |
+
print("\nFirst-time deployment notes:")
|
| 132 |
+
print("- Florence-2 model (~5GB) will download automatically")
|
| 133 |
+
print("- Initial startup may take 5-10 minutes")
|
| 134 |
+
print("- Subsequent starts will be much faster")
|
| 135 |
+
print("- GPU hardware recommended for better performance")
|
| 136 |
+
|
| 137 |
+
print("\nOptional configurations:")
|
| 138 |
+
print("- Edit config.py to change model settings")
|
| 139 |
+
print("- Modify FLORENCE_MODEL_ID for different model variants")
|
| 140 |
+
print("- Adjust MAX_PDF_PAGES for different page limits")
|
| 141 |
+
|
| 142 |
+
def main():
|
| 143 |
+
"""Main deployment preparation function"""
|
| 144 |
+
print("Florence-2 Space Deployment Preparation")
|
| 145 |
+
print("=" * 50)
|
| 146 |
+
|
| 147 |
+
# Run all checks
|
| 148 |
+
checks = [
|
| 149 |
+
("Dependencies", check_dependencies),
|
| 150 |
+
("Files", validate_files),
|
| 151 |
+
("Import", test_import),
|
| 152 |
+
("Tests", run_tests)
|
| 153 |
+
]
|
| 154 |
+
|
| 155 |
+
all_passed = True
|
| 156 |
+
|
| 157 |
+
for check_name, check_func in checks:
|
| 158 |
+
if not check_func():
|
| 159 |
+
all_passed = False
|
| 160 |
+
print(f"\n{check_name} check failed")
|
| 161 |
+
else:
|
| 162 |
+
print(f"\n{check_name} check passed")
|
| 163 |
+
|
| 164 |
+
if all_passed:
|
| 165 |
+
print("\nAll checks passed! Ready for deployment.")
|
| 166 |
+
show_deployment_info()
|
| 167 |
+
else:
|
| 168 |
+
print("\nSome checks failed. Please fix issues before deployment.")
|
| 169 |
+
|
| 170 |
+
return all_passed
|
| 171 |
+
|
| 172 |
+
if __name__ == "__main__":
|
| 173 |
+
success = main()
|
| 174 |
+
sys.exit(0 if success else 1)
|
examples.py
ADDED
|
@@ -0,0 +1,316 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Example usage patterns for Florence-2 Document & Image Analyzer
|
| 3 |
+
This file contains examples of how to use the Space programmatically
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import requests
|
| 7 |
+
import base64
|
| 8 |
+
import io
|
| 9 |
+
from PIL import Image
|
| 10 |
+
|
| 11 |
+
class Florence2SpaceClient:
|
| 12 |
+
"""Client to interact with the Florence-2 Hugging Face Space API"""
|
| 13 |
+
|
| 14 |
+
def __init__(self, space_url: str):
|
| 15 |
+
"""Initialize client with Space URL"""
|
| 16 |
+
self.space_url = space_url.rstrip('/')
|
| 17 |
+
self.api_url = f"{self.space_url}/api/predict"
|
| 18 |
+
|
| 19 |
+
def analyze_image_from_path(self, image_path: str, task_type: str = "object_detection"):
|
| 20 |
+
"""Analyze an image file"""
|
| 21 |
+
try:
|
| 22 |
+
with open(image_path, 'rb') as f:
|
| 23 |
+
files = {'file': f}
|
| 24 |
+
data = {'task_type': task_type}
|
| 25 |
+
|
| 26 |
+
response = requests.post(self.api_url, files=files, data=data)
|
| 27 |
+
return response.json()
|
| 28 |
+
except Exception as e:
|
| 29 |
+
return {"error": f"Failed to process image: {e}"}
|
| 30 |
+
|
| 31 |
+
def analyze_image_from_url(self, image_url: str, task_type: str = "object_detection"):
|
| 32 |
+
"""Download and analyze an image from URL"""
|
| 33 |
+
try:
|
| 34 |
+
# Download image
|
| 35 |
+
img_response = requests.get(image_url)
|
| 36 |
+
img_response.raise_for_status()
|
| 37 |
+
|
| 38 |
+
# Convert to PIL Image
|
| 39 |
+
image = Image.open(io.BytesIO(img_response.content))
|
| 40 |
+
|
| 41 |
+
# Save temporarily and analyze
|
| 42 |
+
temp_path = "temp_image.png"
|
| 43 |
+
image.save(temp_path)
|
| 44 |
+
|
| 45 |
+
result = self.analyze_image_from_path(temp_path, task_type)
|
| 46 |
+
|
| 47 |
+
# Clean up
|
| 48 |
+
import os
|
| 49 |
+
if os.path.exists(temp_path):
|
| 50 |
+
os.remove(temp_path)
|
| 51 |
+
|
| 52 |
+
return result
|
| 53 |
+
except Exception as e:
|
| 54 |
+
return {"error": f"Failed to process URL: {e}"}
|
| 55 |
+
|
| 56 |
+
def example_document_analysis():
|
| 57 |
+
"""Example: Analyze a document with OCR"""
|
| 58 |
+
print("📄 Document Analysis Example")
|
| 59 |
+
print("-" * 30)
|
| 60 |
+
|
| 61 |
+
# This would work with a real Space deployment
|
| 62 |
+
# client = Florence2SpaceClient("https://your-username-florence2-analyzer.hf.space")
|
| 63 |
+
|
| 64 |
+
print("Use case: Extract text from a scanned document")
|
| 65 |
+
print("1. Upload PDF or image of document")
|
| 66 |
+
print("2. Select 'OCR Text Detection' as analysis type")
|
| 67 |
+
print("3. View extracted text with bounding boxes")
|
| 68 |
+
print("4. Copy text from status panel")
|
| 69 |
+
|
| 70 |
+
# Example API call (pseudo-code)
|
| 71 |
+
example_code = """
|
| 72 |
+
# Real usage example:
|
| 73 |
+
client = Florence2SpaceClient("https://your-space-url.hf.space")
|
| 74 |
+
result = client.analyze_image_from_path("document.pdf", "ocr")
|
| 75 |
+
|
| 76 |
+
if result.get("success"):
|
| 77 |
+
print("Extracted text:")
|
| 78 |
+
for text in result["parsed_results"]["labels"]:
|
| 79 |
+
print(f"- {text}")
|
| 80 |
+
"""
|
| 81 |
+
print("\nCode example:")
|
| 82 |
+
print(example_code)
|
| 83 |
+
|
| 84 |
+
def example_photo_analysis():
|
| 85 |
+
"""Example: Analyze photos for objects"""
|
| 86 |
+
print("\n📸 Photo Analysis Example")
|
| 87 |
+
print("-" * 30)
|
| 88 |
+
|
| 89 |
+
print("Use case: Identify objects in vacation photos")
|
| 90 |
+
print("1. Upload JPG/PNG photo")
|
| 91 |
+
print("2. Select 'Object Detection' as analysis type")
|
| 92 |
+
print("3. View detected objects with bounding boxes")
|
| 93 |
+
print("4. Use 'Detailed Caption' for scene description")
|
| 94 |
+
|
| 95 |
+
# Example workflow
|
| 96 |
+
workflow = """
|
| 97 |
+
# Multi-step analysis workflow:
|
| 98 |
+
|
| 99 |
+
# Step 1: Object detection
|
| 100 |
+
objects = client.analyze_image_from_path("vacation.jpg", "object_detection")
|
| 101 |
+
|
| 102 |
+
# Step 2: Detailed description
|
| 103 |
+
caption = client.analyze_image_from_path("vacation.jpg", "detailed_caption")
|
| 104 |
+
|
| 105 |
+
# Step 3: Dense captioning for regions
|
| 106 |
+
regions = client.analyze_image_from_path("vacation.jpg", "dense_captioning")
|
| 107 |
+
"""
|
| 108 |
+
print("\nWorkflow example:")
|
| 109 |
+
print(workflow)
|
| 110 |
+
|
| 111 |
+
def example_technical_diagram():
|
| 112 |
+
"""Example: Analyze technical diagrams"""
|
| 113 |
+
print("\n🔧 Technical Diagram Example")
|
| 114 |
+
print("-" * 30)
|
| 115 |
+
|
| 116 |
+
print("Use case: Analyze engineering drawings or flowcharts")
|
| 117 |
+
print("1. Upload diagram image or PDF")
|
| 118 |
+
print("2. Use 'Region Proposal' to identify components")
|
| 119 |
+
print("3. Use 'OCR' to extract labels and text")
|
| 120 |
+
print("4. Use 'Dense Captioning' for component descriptions")
|
| 121 |
+
|
| 122 |
+
technical_workflow = """
|
| 123 |
+
# Technical analysis pipeline:
|
| 124 |
+
|
| 125 |
+
# Identify key regions
|
| 126 |
+
regions = client.analyze_image_from_path("flowchart.png", "region_proposal")
|
| 127 |
+
|
| 128 |
+
# Extract all text/labels
|
| 129 |
+
text = client.analyze_image_from_path("flowchart.png", "ocr")
|
| 130 |
+
|
| 131 |
+
# Get detailed component descriptions
|
| 132 |
+
descriptions = client.analyze_image_from_path("flowchart.png", "dense_captioning")
|
| 133 |
+
|
| 134 |
+
# Combine results for comprehensive analysis
|
| 135 |
+
analysis = {
|
| 136 |
+
"regions": regions,
|
| 137 |
+
"text_content": text,
|
| 138 |
+
"descriptions": descriptions
|
| 139 |
+
}
|
| 140 |
+
"""
|
| 141 |
+
print("\nTechnical workflow:")
|
| 142 |
+
print(technical_workflow)
|
| 143 |
+
|
| 144 |
+
def example_batch_processing():
|
| 145 |
+
"""Example: Process multiple files"""
|
| 146 |
+
print("\n📚 Batch Processing Example")
|
| 147 |
+
print("-" * 30)
|
| 148 |
+
|
| 149 |
+
print("Use case: Analyze multiple documents in a folder")
|
| 150 |
+
|
| 151 |
+
batch_code = """
|
| 152 |
+
import os
|
| 153 |
+
from pathlib import Path
|
| 154 |
+
|
| 155 |
+
def batch_analyze_folder(folder_path, task_type="ocr"):
|
| 156 |
+
client = Florence2SpaceClient("https://your-space-url.hf.space")
|
| 157 |
+
results = []
|
| 158 |
+
|
| 159 |
+
# Get all supported files
|
| 160 |
+
supported_extensions = ['.png', '.jpg', '.jpeg', '.pdf']
|
| 161 |
+
files = []
|
| 162 |
+
|
| 163 |
+
for ext in supported_extensions:
|
| 164 |
+
files.extend(Path(folder_path).glob(f"*{ext}"))
|
| 165 |
+
files.extend(Path(folder_path).glob(f"*{ext.upper()}"))
|
| 166 |
+
|
| 167 |
+
print(f"Found {len(files)} files to process")
|
| 168 |
+
|
| 169 |
+
for file_path in files:
|
| 170 |
+
print(f"Processing: {file_path.name}")
|
| 171 |
+
|
| 172 |
+
result = client.analyze_image_from_path(str(file_path), task_type)
|
| 173 |
+
|
| 174 |
+
results.append({
|
| 175 |
+
"file": file_path.name,
|
| 176 |
+
"result": result,
|
| 177 |
+
"success": result.get("success", False)
|
| 178 |
+
})
|
| 179 |
+
|
| 180 |
+
return results
|
| 181 |
+
|
| 182 |
+
# Usage
|
| 183 |
+
results = batch_analyze_folder("./documents", "ocr")
|
| 184 |
+
|
| 185 |
+
# Generate report
|
| 186 |
+
successful = sum(1 for r in results if r["success"])
|
| 187 |
+
print(f"Successfully processed: {successful}/{len(results)} files")
|
| 188 |
+
"""
|
| 189 |
+
print("Batch processing implementation:")
|
| 190 |
+
print(batch_code)
|
| 191 |
+
|
| 192 |
+
def example_error_handling():
|
| 193 |
+
"""Example: Proper error handling"""
|
| 194 |
+
print("\n⚠️ Error Handling Example")
|
| 195 |
+
print("-" * 30)
|
| 196 |
+
|
| 197 |
+
error_handling_code = """
|
| 198 |
+
def robust_analysis(file_path, task_type="object_detection"):
|
| 199 |
+
client = Florence2SpaceClient("https://your-space-url.hf.space")
|
| 200 |
+
|
| 201 |
+
try:
|
| 202 |
+
# Check file exists and is valid format
|
| 203 |
+
if not os.path.exists(file_path):
|
| 204 |
+
return {"error": "File not found", "success": False}
|
| 205 |
+
|
| 206 |
+
file_ext = Path(file_path).suffix.lower()
|
| 207 |
+
supported = ['.png', '.jpg', '.jpeg', '.pdf', '.bmp', '.tiff']
|
| 208 |
+
|
| 209 |
+
if file_ext not in supported:
|
| 210 |
+
return {"error": f"Unsupported format: {file_ext}", "success": False}
|
| 211 |
+
|
| 212 |
+
# Perform analysis with retry logic
|
| 213 |
+
max_retries = 3
|
| 214 |
+
for attempt in range(max_retries):
|
| 215 |
+
result = client.analyze_image_from_path(file_path, task_type)
|
| 216 |
+
|
| 217 |
+
if result.get("success"):
|
| 218 |
+
return result
|
| 219 |
+
elif "model not loaded" in result.get("error", "").lower():
|
| 220 |
+
print(f"Model loading, retry {attempt + 1}/{max_retries}")
|
| 221 |
+
time.sleep(10) # Wait for model to load
|
| 222 |
+
else:
|
| 223 |
+
break
|
| 224 |
+
|
| 225 |
+
return result
|
| 226 |
+
|
| 227 |
+
except Exception as e:
|
| 228 |
+
return {"error": f"Unexpected error: {e}", "success": False}
|
| 229 |
+
|
| 230 |
+
# Usage with error handling
|
| 231 |
+
result = robust_analysis("document.pdf", "ocr")
|
| 232 |
+
|
| 233 |
+
if result.get("success"):
|
| 234 |
+
print("Analysis successful!")
|
| 235 |
+
# Process results...
|
| 236 |
+
else:
|
| 237 |
+
print(f"Analysis failed: {result.get('error')}")
|
| 238 |
+
# Handle error...
|
| 239 |
+
"""
|
| 240 |
+
print("Robust error handling:")
|
| 241 |
+
print(error_handling_code)
|
| 242 |
+
|
| 243 |
+
def show_integration_examples():
|
| 244 |
+
"""Show how to integrate with other tools"""
|
| 245 |
+
print("\n🔗 Integration Examples")
|
| 246 |
+
print("-" * 30)
|
| 247 |
+
|
| 248 |
+
integration_examples = """
|
| 249 |
+
# 1. Integration with document management systems
|
| 250 |
+
def process_uploaded_documents(upload_folder):
|
| 251 |
+
for file_path in Path(upload_folder).iterdir():
|
| 252 |
+
if file_path.suffix.lower() == '.pdf':
|
| 253 |
+
# Extract text with Florence-2
|
| 254 |
+
result = client.analyze_image_from_path(str(file_path), "ocr")
|
| 255 |
+
|
| 256 |
+
# Save extracted text
|
| 257 |
+
if result.get("success"):
|
| 258 |
+
text_content = "\\n".join(result["parsed_results"]["labels"])
|
| 259 |
+
text_file = file_path.with_suffix('.txt')
|
| 260 |
+
text_file.write_text(text_content)
|
| 261 |
+
|
| 262 |
+
# 2. Integration with databases
|
| 263 |
+
def store_analysis_results(image_path, database_connection):
|
| 264 |
+
result = client.analyze_image_from_path(image_path, "object_detection")
|
| 265 |
+
|
| 266 |
+
if result.get("success"):
|
| 267 |
+
objects = result["parsed_results"]["labels"]
|
| 268 |
+
|
| 269 |
+
# Store in database
|
| 270 |
+
cursor = database_connection.cursor()
|
| 271 |
+
for obj in objects:
|
| 272 |
+
cursor.execute(
|
| 273 |
+
"INSERT INTO detected_objects (image_path, object_name) VALUES (?, ?)",
|
| 274 |
+
(image_path, obj)
|
| 275 |
+
)
|
| 276 |
+
database_connection.commit()
|
| 277 |
+
|
| 278 |
+
# 3. Integration with web scraping
|
| 279 |
+
def analyze_web_images(urls):
|
| 280 |
+
results = []
|
| 281 |
+
for url in urls:
|
| 282 |
+
result = client.analyze_image_from_url(url, "detailed_caption")
|
| 283 |
+
results.append({
|
| 284 |
+
"url": url,
|
| 285 |
+
"description": result.get("parsed_results", {}).get("detailed_caption", "")
|
| 286 |
+
})
|
| 287 |
+
return results
|
| 288 |
+
"""
|
| 289 |
+
print("Integration patterns:")
|
| 290 |
+
print(integration_examples)
|
| 291 |
+
|
| 292 |
+
def main():
|
| 293 |
+
"""Main examples function"""
|
| 294 |
+
print("🎯 Florence-2 Document & Image Analyzer - Usage Examples")
|
| 295 |
+
print("=" * 60)
|
| 296 |
+
|
| 297 |
+
# Show all examples
|
| 298 |
+
example_document_analysis()
|
| 299 |
+
example_photo_analysis()
|
| 300 |
+
example_technical_diagram()
|
| 301 |
+
example_batch_processing()
|
| 302 |
+
example_error_handling()
|
| 303 |
+
show_integration_examples()
|
| 304 |
+
|
| 305 |
+
print("\n" + "=" * 60)
|
| 306 |
+
print("📝 Notes:")
|
| 307 |
+
print("• Replace 'https://your-space-url.hf.space' with actual Space URL")
|
| 308 |
+
print("• First request may be slow due to model loading")
|
| 309 |
+
print("• GPU Spaces process images much faster than CPU")
|
| 310 |
+
print("• Check Space logs for detailed error information")
|
| 311 |
+
print("• Consider rate limiting for batch processing")
|
| 312 |
+
|
| 313 |
+
print("\n🚀 Ready to deploy and test your Florence-2 Space!")
|
| 314 |
+
|
| 315 |
+
if __name__ == "__main__":
|
| 316 |
+
main()
|
packages.txt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
poppler-utils
|
| 2 |
+
libgl1-mesa-glx
|
| 3 |
+
libglib2.0-0
|
requirements.txt
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Core dependencies
|
| 2 |
+
gradio==4.44.0
|
| 3 |
+
torch>=2.0.0
|
| 4 |
+
torchvision>=0.15.0
|
| 5 |
+
transformers>=4.35.0
|
| 6 |
+
Pillow>=9.0.0
|
| 7 |
+
numpy>=1.21.0
|
| 8 |
+
|
| 9 |
+
# Florence-2 specific dependencies
|
| 10 |
+
timm>=0.9.0
|
| 11 |
+
einops>=0.7.0
|
| 12 |
+
safetensors>=0.4.0
|
| 13 |
+
accelerate>=0.21.0
|
| 14 |
+
|
| 15 |
+
# PDF processing
|
| 16 |
+
pdf2image>=3.1.0
|
| 17 |
+
|
| 18 |
+
# Image processing and visualization
|
| 19 |
+
opencv-python>=4.8.0
|
| 20 |
+
matplotlib>=3.6.0
|
| 21 |
+
|
| 22 |
+
# Additional utilities
|
| 23 |
+
requests>=2.28.0
|
| 24 |
+
packaging>=21.0
|
| 25 |
+
sentencepiece>=0.1.99
|
| 26 |
+
protobuf>=3.20.0
|
test_app.py
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Test script for Florence-2 Document & Image Analyzer
|
| 4 |
+
Run this to verify the application works correctly before deployment
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import tempfile
|
| 8 |
+
import os
|
| 9 |
+
from PIL import Image
|
| 10 |
+
import numpy as np
|
| 11 |
+
|
| 12 |
+
def create_test_image():
|
| 13 |
+
"""Create a simple test image"""
|
| 14 |
+
# Create a simple image with some shapes
|
| 15 |
+
img = Image.new('RGB', (400, 300), color='white')
|
| 16 |
+
# This would normally have some content, but for testing we'll use a plain image
|
| 17 |
+
return img
|
| 18 |
+
|
| 19 |
+
def test_basic_functionality():
|
| 20 |
+
"""Test basic app functionality"""
|
| 21 |
+
print("Testing Florence-2 Document & Image Analyzer...")
|
| 22 |
+
|
| 23 |
+
try:
|
| 24 |
+
# Import main modules
|
| 25 |
+
from app import Florence2Analyzer, process_uploaded_file, create_interface
|
| 26 |
+
print("Successfully imported app modules")
|
| 27 |
+
|
| 28 |
+
# Test model loading (this might take a while on first run)
|
| 29 |
+
print("Testing model loading...")
|
| 30 |
+
analyzer = Florence2Analyzer()
|
| 31 |
+
|
| 32 |
+
if analyzer.model is None:
|
| 33 |
+
print("Warning: Florence-2 model not loaded (this is expected on first run)")
|
| 34 |
+
else:
|
| 35 |
+
print("Florence-2 model loaded successfully")
|
| 36 |
+
|
| 37 |
+
# Test image processing
|
| 38 |
+
print("Testing image processing...")
|
| 39 |
+
test_img = create_test_image()
|
| 40 |
+
|
| 41 |
+
# Save test image temporarily
|
| 42 |
+
with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as tmp_file:
|
| 43 |
+
test_img.save(tmp_file.name)
|
| 44 |
+
|
| 45 |
+
# Test processing (mock file object)
|
| 46 |
+
class MockFile:
|
| 47 |
+
def __init__(self, path):
|
| 48 |
+
self.name = path
|
| 49 |
+
|
| 50 |
+
mock_file = MockFile(tmp_file.name)
|
| 51 |
+
|
| 52 |
+
try:
|
| 53 |
+
original_imgs, annotated_imgs, status = process_uploaded_file(mock_file, "detailed_caption")
|
| 54 |
+
print(f"Image processing completed. Status: {status[:100]}...")
|
| 55 |
+
except Exception as e:
|
| 56 |
+
print(f"Image processing test failed (expected on first run): {e}")
|
| 57 |
+
finally:
|
| 58 |
+
os.unlink(tmp_file.name)
|
| 59 |
+
|
| 60 |
+
# Test interface creation
|
| 61 |
+
print("Testing Gradio interface creation...")
|
| 62 |
+
demo = create_interface()
|
| 63 |
+
print("Gradio interface created successfully")
|
| 64 |
+
|
| 65 |
+
print("\nBasic functionality tests completed!")
|
| 66 |
+
print("\nNext steps:")
|
| 67 |
+
print("1. Upload this Space to Hugging Face")
|
| 68 |
+
print("2. The model will download automatically on first run")
|
| 69 |
+
print("3. Test with real images and PDFs")
|
| 70 |
+
|
| 71 |
+
return True
|
| 72 |
+
|
| 73 |
+
except ImportError as e:
|
| 74 |
+
print(f"Import error: {e}")
|
| 75 |
+
print("Make sure all dependencies are installed: pip install -r requirements.txt")
|
| 76 |
+
return False
|
| 77 |
+
except Exception as e:
|
| 78 |
+
print(f"Unexpected error: {e}")
|
| 79 |
+
return False
|
| 80 |
+
|
| 81 |
+
if __name__ == "__main__":
|
| 82 |
+
success = test_basic_functionality()
|
| 83 |
+
exit(0 if success else 1)
|