File size: 3,736 Bytes
84b2551 b8a8a54 84b2551 b8a8a54 84b2551 b8a8a54 84b2551 b8a8a54 84b2551 8f8ea37 84b2551 b8a8a54 84b2551 b8a8a54 84b2551 b8a8a54 84b2551 b8a8a54 84b2551 8f8ea37 b8a8a54 84b2551 b8a8a54 84b2551 b8a8a54 84b2551 b8a8a54 84b2551 b8a8a54 8f8ea37 b8a8a54 84b2551 b8a8a54 84b2551 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
---
language:
- en
- zh
- es
- fr
- de
- ja
- ko
- ar
- hi
- ru
license: apache-2.0
tags:
- ocr
- vision-language
- paligemma
- custom-model
- text-extraction
- document-ai
- multi-language
library_name: transformers
pipeline_tag: image-to-text
base_model: google/paligemma-3b-pt-224
---
# pixeltext-ai - FIXED VERSION β
**π FIXED: Hub loading now works properly!**
A high-performance OCR model based on PaliGemma-3B, now with proper Hugging Face Hub support.
## β
What's Fixed
- **Hub Loading**: `AutoModel.from_pretrained()` now works correctly
- **from_pretrained Method**: Proper implementation added
- **Configuration**: Fixed model configuration for Hub compatibility
- **Error Handling**: Improved error handling and fallbacks
## π Quick Start (NOW WORKS!)
```python
from transformers import AutoModel
from PIL import Image
# Load model from Hub (FIXED!)
model = AutoModel.from_pretrained("BabaK07/pixeltext-ai", trust_remote_code=True)
# Load image
image = Image.open("your_image.jpg")
# Extract text
result = model.generate_ocr_text(image)
print(f"Text: {result['text']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Success: {result['success']}")
```
## π Performance
- β‘ **Speed**: ~3 seconds per image
- π― **Accuracy**: Up to 95% confidence
- π **Languages**: 100+ supported
- π» **Device**: CPU and GPU support
- π **Batch**: Multiple image processing
## π οΈ Features
- β
**Hub Loading**: Works with `AutoModel.from_pretrained()`
- β
**Fast Inference**: Optimized for speed
- β
**High Accuracy**: Based on PaliGemma-3B
- β
**Multi-language**: Supports 100+ languages
- β
**Batch Processing**: Handle multiple images
- β
**Custom Prompts**: Tailor extraction for specific needs
- β
**Production Ready**: Error handling included
## π Usage Examples
### Basic Usage
```python
from transformers import AutoModel
from PIL import Image
model = AutoModel.from_pretrained("BabaK07/pixeltext-ai", trust_remote_code=True)
image = Image.open("document.jpg")
result = model.generate_ocr_text(image)
```
### Custom Prompts
```python
result = model.generate_ocr_text(
image,
prompt="<image>Extract all invoice details including amounts:"
)
```
### Batch Processing
```python
images = [Image.open(f"doc_{i}.jpg") for i in range(5)]
results = model.batch_ocr(images)
```
### File Path Input
```python
result = model.generate_ocr_text("path/to/your/image.jpg")
```
## π§ Installation
```bash
pip install torch transformers pillow
```
## π Model Details
- **Base Model**: google/paligemma-3b-pt-224
- **Model Size**: ~3B parameters
- **Architecture**: Vision-Language Transformer
- **Optimization**: OCR-specific enhancements
- **Training**: Custom OCR pipeline
## π Comparison
| Feature | Before (Broken) | After (FIXED) |
|---------|----------------|---------------|
| Hub Loading | β AttributeError | β
Works perfectly |
| from_pretrained | β Missing | β
Implemented |
| AutoModel | β Failed | β
Compatible |
| Configuration | β Invalid | β
Proper config |
## π― Use Cases
- **Document Digitization**: Convert scanned documents
- **Invoice Processing**: Extract invoice data
- **Form Processing**: Digitize forms
- **Receipt OCR**: Extract receipt information
- **Multi-language Documents**: Handle international text
- **Batch Processing**: Process document collections
## π Related Models
- **textract-ai**: https://huggingface.co/BabaK07/textract-ai (Qwen-based, higher accuracy)
- **Base Model**: https://huggingface.co/google/paligemma-3b-pt-224
## π Support
For issues or questions, please check the model repository or contact the author.
---
**Status**: β
FIXED and ready for production use!
|