Spaces:
Runtime error
Runtime error
Working version 1: GOT OCR works with latex output
Browse files
README.md
CHANGED
|
@@ -31,10 +31,11 @@ Markit is a powerful tool that converts various document formats (PDF, DOCX, ima
|
|
| 31 |
- **PyPdfium**: Fast PDF parsing using the PDFium engine
|
| 32 |
- **Docling**: Advanced document structure analysis
|
| 33 |
- **Gemini Flash**: AI-powered conversion using Google's Gemini API
|
| 34 |
-
- **GOT-OCR**: State-of-the-art OCR model for images (JPG/PNG only)
|
| 35 |
- **OCR Integration**: Extract text from images and scanned documents using Tesseract OCR
|
| 36 |
- **Interactive UI**: User-friendly Gradio interface with page navigation for large documents
|
| 37 |
- **AI-Powered Chat**: Interact with your documents using AI to ask questions about content
|
|
|
|
| 38 |
|
| 39 |
## System Architecture
|
| 40 |
The application is built with a modular architecture:
|
|
@@ -85,14 +86,16 @@ The GOT-OCR parser requires:
|
|
| 85 |
1. CUDA-capable GPU with sufficient memory
|
| 86 |
2. The following dependencies will be installed automatically:
|
| 87 |
```bash
|
| 88 |
-
torch
|
| 89 |
-
torchvision
|
| 90 |
-
transformers
|
| 91 |
-
|
| 92 |
-
verovio
|
| 93 |
-
|
|
|
|
| 94 |
```
|
| 95 |
3. Note that GOT-OCR only supports JPG and PNG image formats
|
|
|
|
| 96 |
|
| 97 |
## Deploying to Hugging Face Spaces
|
| 98 |
|
|
@@ -126,6 +129,8 @@ build:
|
|
| 126 |
- **None**: No OCR processing (for documents with selectable text)
|
| 127 |
- **Tesseract**: Basic OCR using Tesseract
|
| 128 |
- **Advanced**: Enhanced OCR with layout preservation (available with specific parsers)
|
|
|
|
|
|
|
| 129 |
4. Select your desired output format:
|
| 130 |
- **Markdown**: Clean, readable markdown format
|
| 131 |
- **JSON**: Structured data representation
|
|
@@ -152,8 +157,11 @@ build:
|
|
| 152 |
- Verify that all required dependencies are installed correctly
|
| 153 |
- Remember that GOT-OCR only supports JPG and PNG image formats
|
| 154 |
- If you encounter CUDA out-of-memory errors, try using a smaller image
|
| 155 |
-
-
|
| 156 |
-
- If you see errors about
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
### General Issues
|
| 159 |
- Check the console logs for error messages
|
|
@@ -186,6 +194,7 @@ markit/
|
|
| 186 |
β β βββ parser_interface.py # Parser interface
|
| 187 |
β β βββ parser_registry.py # Parser registry
|
| 188 |
β β βββ docling_parser.py # Docling parser
|
|
|
|
| 189 |
β β βββ pypdfium_parser.py # PyPDFium parser
|
| 190 |
β βββ ui/ # User interface
|
| 191 |
β β βββ __init__.py # Package initialization
|
|
@@ -194,4 +203,14 @@ markit/
|
|
| 194 |
β βββ __init__.py # Package initialization
|
| 195 |
βββ tests/ # Tests
|
| 196 |
βββ __init__.py # Package initialization
|
| 197 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
- **PyPdfium**: Fast PDF parsing using the PDFium engine
|
| 32 |
- **Docling**: Advanced document structure analysis
|
| 33 |
- **Gemini Flash**: AI-powered conversion using Google's Gemini API
|
| 34 |
+
- **GOT-OCR**: State-of-the-art OCR model for images (JPG/PNG only) with plain text and formatted text options
|
| 35 |
- **OCR Integration**: Extract text from images and scanned documents using Tesseract OCR
|
| 36 |
- **Interactive UI**: User-friendly Gradio interface with page navigation for large documents
|
| 37 |
- **AI-Powered Chat**: Interact with your documents using AI to ask questions about content
|
| 38 |
+
- **ZeroGPU Support**: Optimized for Hugging Face Spaces with Stateless GPU environments
|
| 39 |
|
| 40 |
## System Architecture
|
| 41 |
The application is built with a modular architecture:
|
|
|
|
| 86 |
1. CUDA-capable GPU with sufficient memory
|
| 87 |
2. The following dependencies will be installed automatically:
|
| 88 |
```bash
|
| 89 |
+
torch
|
| 90 |
+
torchvision
|
| 91 |
+
git+https://github.com/huggingface/transformers.git@main # Latest transformers from GitHub
|
| 92 |
+
accelerate
|
| 93 |
+
verovio
|
| 94 |
+
numpy==1.26.3 # Specific version required
|
| 95 |
+
opencv-python
|
| 96 |
```
|
| 97 |
3. Note that GOT-OCR only supports JPG and PNG image formats
|
| 98 |
+
4. In HF Spaces, the integration with ZeroGPU is automatic and optimized for Stateless GPU environments
|
| 99 |
|
| 100 |
## Deploying to Hugging Face Spaces
|
| 101 |
|
|
|
|
| 129 |
- **None**: No OCR processing (for documents with selectable text)
|
| 130 |
- **Tesseract**: Basic OCR using Tesseract
|
| 131 |
- **Advanced**: Enhanced OCR with layout preservation (available with specific parsers)
|
| 132 |
+
- **Plain Text**: For GOT-OCR, extracts raw text without formatting
|
| 133 |
+
- **Formatted Text**: For GOT-OCR, preserves formatting and converts to Markdown
|
| 134 |
4. Select your desired output format:
|
| 135 |
- **Markdown**: Clean, readable markdown format
|
| 136 |
- **JSON**: Structured data representation
|
|
|
|
| 157 |
- Verify that all required dependencies are installed correctly
|
| 158 |
- Remember that GOT-OCR only supports JPG and PNG image formats
|
| 159 |
- If you encounter CUDA out-of-memory errors, try using a smaller image
|
| 160 |
+
- In Hugging Face Spaces with Stateless GPU, ensure the `spaces` module is imported before any CUDA initialization
|
| 161 |
+
- If you see errors about "CUDA must not be initialized in the main process", verify the import order in your app.py
|
| 162 |
+
- If you encounter "cannot pickle '_thread.lock' object" errors, this indicates thread locks are being passed to the GPU function
|
| 163 |
+
- The GOT-OCR parser has been optimized for ZeroGPU in Stateless GPU environments with proper serialization handling
|
| 164 |
+
- For local development, the parser will fall back to CPU processing if GPU is not available
|
| 165 |
|
| 166 |
### General Issues
|
| 167 |
- Check the console logs for error messages
|
|
|
|
| 194 |
β β βββ parser_interface.py # Parser interface
|
| 195 |
β β βββ parser_registry.py # Parser registry
|
| 196 |
β β βββ docling_parser.py # Docling parser
|
| 197 |
+
β β βββ got_ocr_parser.py # GOT-OCR parser for images
|
| 198 |
β β βββ pypdfium_parser.py # PyPDFium parser
|
| 199 |
β βββ ui/ # User interface
|
| 200 |
β β βββ __init__.py # Package initialization
|
|
|
|
| 203 |
β βββ __init__.py # Package initialization
|
| 204 |
βββ tests/ # Tests
|
| 205 |
βββ __init__.py # Package initialization
|
| 206 |
+
```
|
| 207 |
+
|
| 208 |
+
### ZeroGPU Integration Notes
|
| 209 |
+
|
| 210 |
+
When developing for Hugging Face Spaces with Stateless GPU:
|
| 211 |
+
|
| 212 |
+
1. Always import the `spaces` module before any CUDA initialization
|
| 213 |
+
2. Place all CUDA operations inside functions decorated with `@spaces.GPU()`
|
| 214 |
+
3. Ensure only picklable objects are passed to GPU-decorated functions
|
| 215 |
+
4. Use wrapper functions to filter out unpicklable objects like thread locks
|
| 216 |
+
5. For advanced use cases, consider implementing fallback mechanisms for serialization errors
|