--- language: - en - zh - es - fr - de - ja - ko - ar - hi - ru license: apache-2.0 tags: - ocr - vision-language - qwen2-vl - custom-model - text-extraction - document-ai - high-accuracy library_name: transformers pipeline_tag: image-to-text base_model: Qwen/Qwen2-VL-2B-Instruct --- # textract-ai - FIXED VERSION ✅ **🎉 FIXED: Hub loading now works properly!** A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support. ## ✅ What's Fixed - **Hub Loading**: `AutoModel.from_pretrained()` now works correctly - **from_pretrained Method**: Proper implementation added - **Configuration**: Fixed model configuration for Hub compatibility - **Error Handling**: Improved error handling and fallbacks ## 🚀 Quick Start (NOW WORKS!) ```python from transformers import AutoModel from PIL import Image # Load model from Hub (FIXED!) model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True) # Load image image = Image.open("your_image.jpg") # Extract text result = model.generate_ocr_text(image, use_native=True) print(f"Text: {result['text']}") print(f"Confidence: {result['confidence']:.1%}") print(f"Success: {result['success']}") ``` ## 📊 Performance - 🎯 **Accuracy**: High accuracy OCR (up to 95% confidence) - ⏱️ **Speed**: ~13 seconds per image (high quality) - 🌍 **Languages**: Multi-language support - 💻 **Device**: CPU and GPU support - 📄 **Documents**: Excellent for complex documents ## 🛠️ Features - ✅ **Hub Loading**: Works with `AutoModel.from_pretrained()` - ✅ **High Accuracy**: Based on Qwen2-VL-2B-Instruct - ✅ **Multi-language**: Supports many languages - ✅ **Document OCR**: Excellent for invoices, forms, documents - ✅ **Robust Processing**: Multiple extraction methods - ✅ **Production Ready**: Error handling included ## 📝 Usage Examples ### Basic Usage ```python from transformers import AutoModel from PIL import Image model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True) image = Image.open("document.jpg") result = model.generate_ocr_text(image, use_native=True) ``` ### High Accuracy Mode ```python result = model.generate_ocr_text(image, use_native=True) # Best accuracy ``` ### Fast Mode ```python result = model.generate_ocr_text(image, use_native=False) # Faster processing ``` ### File Path Input ```python result = model.generate_ocr_text("path/to/your/image.jpg") ``` ## 🔧 Installation ```bash pip install torch transformers pillow ``` ## 📈 Model Details - **Base Model**: Qwen/Qwen2-VL-2B-Instruct - **Model Size**: ~2.5B parameters - **Architecture**: Vision-Language Transformer - **Optimization**: OCR-specific processing - **Training**: Custom OCR pipeline ## 🆚 Comparison | Feature | Before (Broken) | After (FIXED) | |---------|----------------|---------------| | Hub Loading | ❌ ValueError | ✅ Works perfectly | | from_pretrained | ❌ Missing | ✅ Implemented | | AutoModel | ❌ Failed | ✅ Compatible | | Configuration | ❌ Invalid | ✅ Proper config | ## 🎯 Use Cases - **High-Accuracy OCR**: When accuracy is most important - **Document Processing**: Complex invoices, forms, contracts - **Multi-language Text**: International documents - **Professional OCR**: Business and enterprise use - **Research Applications**: Academic and research projects ## 🔗 Related Models - **pixeltext-ai**: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster) - **Base Model**: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct ## 📞 Support For issues or questions, please check the model repository or contact the author. --- **Status**: ✅ FIXED and ready for production use!