textract-ai / README.md
BabaK07's picture
FIX: Add proper README.md with from_pretrained support
09b5360 verified
---
language:
- en
- zh
- es
- fr
- de
- ja
- ko
- ar
- hi
- ru
license: apache-2.0
tags:
- ocr
- vision-language
- qwen2-vl
- custom-model
- text-extraction
- document-ai
- high-accuracy
library_name: transformers
pipeline_tag: image-to-text
base_model: Qwen/Qwen2-VL-2B-Instruct
---
# textract-ai - FIXED VERSION βœ…
**πŸŽ‰ FIXED: Hub loading now works properly!**
A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support.
## βœ… What's Fixed
- **Hub Loading**: `AutoModel.from_pretrained()` now works correctly
- **from_pretrained Method**: Proper implementation added
- **Configuration**: Fixed model configuration for Hub compatibility
- **Error Handling**: Improved error handling and fallbacks
## πŸš€ Quick Start (NOW WORKS!)
```python
from transformers import AutoModel
from PIL import Image
# Load model from Hub (FIXED!)
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
# Load image
image = Image.open("your_image.jpg")
# Extract text
result = model.generate_ocr_text(image, use_native=True)
print(f"Text: {result['text']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Success: {result['success']}")
```
## πŸ“Š Performance
- 🎯 **Accuracy**: High accuracy OCR (up to 95% confidence)
- ⏱️ **Speed**: ~13 seconds per image (high quality)
- 🌍 **Languages**: Multi-language support
- πŸ’» **Device**: CPU and GPU support
- πŸ“„ **Documents**: Excellent for complex documents
## πŸ› οΈ Features
- βœ… **Hub Loading**: Works with `AutoModel.from_pretrained()`
- βœ… **High Accuracy**: Based on Qwen2-VL-2B-Instruct
- βœ… **Multi-language**: Supports many languages
- βœ… **Document OCR**: Excellent for invoices, forms, documents
- βœ… **Robust Processing**: Multiple extraction methods
- βœ… **Production Ready**: Error handling included
## πŸ“ Usage Examples
### Basic Usage
```python
from transformers import AutoModel
from PIL import Image
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
image = Image.open("document.jpg")
result = model.generate_ocr_text(image, use_native=True)
```
### High Accuracy Mode
```python
result = model.generate_ocr_text(image, use_native=True) # Best accuracy
```
### Fast Mode
```python
result = model.generate_ocr_text(image, use_native=False) # Faster processing
```
### File Path Input
```python
result = model.generate_ocr_text("path/to/your/image.jpg")
```
## πŸ”§ Installation
```bash
pip install torch transformers pillow
```
## πŸ“ˆ Model Details
- **Base Model**: Qwen/Qwen2-VL-2B-Instruct
- **Model Size**: ~2.5B parameters
- **Architecture**: Vision-Language Transformer
- **Optimization**: OCR-specific processing
- **Training**: Custom OCR pipeline
## πŸ†š Comparison
| Feature | Before (Broken) | After (FIXED) |
|---------|----------------|---------------|
| Hub Loading | ❌ ValueError | βœ… Works perfectly |
| from_pretrained | ❌ Missing | βœ… Implemented |
| AutoModel | ❌ Failed | βœ… Compatible |
| Configuration | ❌ Invalid | βœ… Proper config |
## 🎯 Use Cases
- **High-Accuracy OCR**: When accuracy is most important
- **Document Processing**: Complex invoices, forms, contracts
- **Multi-language Text**: International documents
- **Professional OCR**: Business and enterprise use
- **Research Applications**: Academic and research projects
## πŸ”— Related Models
- **pixeltext-ai**: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster)
- **Base Model**: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
## πŸ“ž Support
For issues or questions, please check the model repository or contact the author.
---
**Status**: βœ… FIXED and ready for production use!