--- title: TextLens - AI-Powered OCR emoji: ๐Ÿ” colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.0.0 app_file: app.py pinned: false license: mit --- # ๐Ÿ” TextLens - AI-Powered OCR [![Deploy to HuggingFace](https://img.shields.io/badge/๐Ÿค—-Deploy%20to%20Spaces-blue)](https://huggingface.co/spaces/GoConqurer/textlens-ocr) [![GitHub](https://img.shields.io/badge/GitHub-Repository-green)](https://github.com/KumarAmrit30/textlens-ocr) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/) A state-of-the-art Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 with intelligent fallback systems and enterprise-grade zero downtime deployment. ## ๐Ÿš€ Live Demo **๐Ÿ”— Try it now:** [https://huggingface.co/spaces/GoConqurer/textlens-ocr](https://huggingface.co/spaces/GoConqurer/textlens-ocr) ![TextLens Demo](https://img.shields.io/badge/Demo-Live-brightgreen) ## โœจ Key Features ### ๐Ÿค– Advanced AI-Powered OCR - **Microsoft Florence-2 VLM**: State-of-the-art vision-language model for text extraction - **Intelligent Fallback System**: Automatic fallback to EasyOCR if primary model fails - **Multi-Model Support**: Florence-2-base and Florence-2-large variants - **Real-time Processing**: Instant text extraction on image upload ### ๐ŸŽจ Modern User Experience - **Clean UI**: Professional Gradio interface with intuitive design - **Multiple Input Methods**: Upload files, use webcam, or paste from clipboard - **Copy-to-Clipboard**: One-click text copying functionality - **Responsive Design**: Works seamlessly on desktop and mobile devices - **Dark/Light Theme**: Automatic theme adaptation ### โšก Performance & Reliability - **GPU Acceleration**: Supports CUDA, MPS (Apple Silicon), and CPU inference - **Smart Device Detection**: Automatically uses best available hardware - **Error Resilience**: Robust error handling with graceful degradation - **Memory Optimization**: Efficient model loading and cleanup ### ๐Ÿ›ก๏ธ Enterprise Features - **Zero Downtime Deployment**: Blue-green deployment with health checks - **Health Monitoring**: Built-in `/health` and `/ready` endpoints - **Graceful Shutdown**: Signal handling for clean application restarts - **Production Ready**: Scalable architecture with automated deployment ## ๐Ÿ—๏ธ Architecture ``` textlens-ocr/ โ”œโ”€โ”€ ๐Ÿ“ฑ Frontend (Gradio UI) โ”‚ โ”œโ”€โ”€ ui/interface.py # Main interface components โ”‚ โ”œโ”€โ”€ ui/handlers.py # Event handlers & logic โ”‚ โ””โ”€โ”€ ui/styles.py # CSS styling & themes โ”œโ”€โ”€ ๐Ÿง  AI Models โ”‚ โ””โ”€โ”€ models/ocr_processor.py # OCR engine with fallbacks โ”œโ”€โ”€ ๐Ÿ”ง Utilities โ”‚ โ””โ”€โ”€ utils/image_utils.py # Image preprocessing โ”œโ”€โ”€ ๐Ÿš€ Deployment โ”‚ โ”œโ”€โ”€ .github/workflows/ # CI/CD pipelines โ”‚ โ”œโ”€โ”€ scripts/deploy.py # Manual deployment tools โ”‚ โ””โ”€โ”€ deployment.config.yml # Deployment configuration โ”œโ”€โ”€ ๐Ÿ“š Documentation โ”‚ โ”œโ”€โ”€ README.md # Main documentation โ”‚ โ””โ”€โ”€ DEPLOYMENT.md # Deployment guide โ””โ”€โ”€ โš™๏ธ Configuration โ”œโ”€โ”€ app.py # Main application entry โ””โ”€โ”€ requirements.txt # Dependencies ``` ## ๐Ÿš€ Quick Start ### ๐ŸŒ Online (Recommended) **Instant access** - No installation required: ๐Ÿ‘‰ [**Launch TextLens**](https://huggingface.co/spaces/GoConqurer/textlens-ocr) ### ๐Ÿ’ป Local Development 1. **Clone Repository** ```bash git clone https://github.com/KumarAmrit30/textlens-ocr.git cd textlens-ocr ``` 2. **Setup Environment** ```bash python -m venv textlens_env source textlens_env/bin/activate # Windows: textlens_env\Scripts\activate pip install -r requirements.txt ``` 3. **Launch Application** ```bash python app.py ``` ๐ŸŒ Open: `http://localhost:7860` ### ๐Ÿงช Quick Test ```bash # Verify installation python -c "from models.ocr_processor import OCRProcessor; print('โœ… TextLens ready!')" ``` ## ๐Ÿ“Š Model Performance | Model | Size | Speed | Accuracy | Best For | | -------------------- | ----- | --------- | ------------ | ---------------------- | | **Florence-2-base** | 270M | โšก Fast | ๐Ÿ“ˆ High | General OCR, Real-time | | **Florence-2-large** | 770M | ๐ŸŒ Medium | ๐Ÿ“Š Very High | High accuracy needs | | **EasyOCR** | ~100M | ๐Ÿš€ Medium | ๐Ÿ“‹ Good | Fallback, Multilingual | ## ๐ŸŽฏ Supported Use Cases | Category | Examples | Performance | | ------------------- | ------------------------------- | ----------- | | ๐Ÿ“„ **Documents** | PDFs, Scanned papers, Forms | โญโญโญโญโญ | | ๐Ÿงพ **Receipts** | Shopping receipts, Invoices | โญโญโญโญ | | ๐Ÿ“ฑ **Screenshots** | App interfaces, Error messages | โญโญโญโญโญ | | ๐Ÿš— **Vehicle** | License plates, VIN numbers | โญโญโญโญ | | ๐Ÿ“š **Books** | Printed text, Handwritten notes | โญโญโญโญ | | ๐ŸŒ **Multilingual** | Multiple languages | โญโญโญ | ## ๐Ÿ”ง Configuration ### ๐ŸŽ›๏ธ Model Selection ```python from models.ocr_processor import OCRProcessor # Fast inference (recommended) ocr = OCRProcessor(model_name="microsoft/Florence-2-base") # Maximum accuracy ocr = OCRProcessor(model_name="microsoft/Florence-2-large") ``` ### ๐ŸŽจ UI Customization Modify `ui/styles.py` to customize appearance: ```python # Change color scheme PRIMARY_COLOR = "#1f77b4" SECONDARY_COLOR = "#ff7f0e" # Update layout INTERFACE_WIDTH = "100%" ``` ### โš™๏ธ Environment Variables | Variable | Description | Default | | ---------------------- | -------------------- | ---------------------- | | `SPACE_ID` | HuggingFace Space ID | Auto-detected | | `DEPLOYMENT_STAGE` | deployment stage | `production` | | `TRANSFORMERS_CACHE` | Model cache path | `~/.cache/huggingface` | | `CUDA_VISIBLE_DEVICES` | GPU selection | All available | ## ๐Ÿš€ Deployment ### ๐Ÿค— HuggingFace Spaces (Recommended) **Automatic Deployment:** 1. Fork this repository 2. Push to `main`/`master` branch 3. GitHub Actions automatically deploys to HuggingFace Spaces 4. Access your deployed app at: `https://huggingface.co/spaces/USERNAME/textlens-ocr` **Manual Deployment:** 1. Go to [GitHub Actions](https://github.com/KumarAmrit30/textlens-ocr/actions) 2. Select "Deploy to HuggingFace Spaces" 3. Click "Run workflow" 4. Choose deployment type: - **Direct**: Quick deployment to production - **Blue-Green**: Zero downtime with staging validation ### ๐Ÿ”„ Zero Downtime Deployment Our enterprise-grade deployment system ensures **zero downtime** for users: **Features:** - ๐Ÿ”ต **Blue-Green Deployment**: Test in staging before production - ๐Ÿฅ **Health Monitoring**: Automatic health checks with retry logic - ๐Ÿ”„ **Graceful Shutdown**: Clean application restarts - ๐Ÿ“Š **Real-time Monitoring**: Deployment status tracking **Health Endpoints:** - `GET /health` - Application health status - `GET /ready` - Application readiness check **Deployment Flow:** ```mermaid graph LR A[Code Push] --> B[Validate] B --> C[Deploy Staging] C --> D[Health Check] D --> E[Deploy Production] E --> F[Verify] F --> G[Complete โœ…] ``` ### ๐Ÿณ Docker Deployment ```dockerfile FROM python:3.9-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 7860 CMD ["python", "app.py"] ``` Build and run: ```bash docker build -t textlens-ocr . docker run -p 7860:7860 textlens-ocr ``` ### โ˜๏ธ Cloud Platforms | Platform | Status | Guide | | ---------------------- | ------------- | ------------------------------------------------------------------- | | **HuggingFace Spaces** | โœ… Ready | [Deploy Now](https://huggingface.co/spaces/GoConqurer/textlens-ocr) | | **Google Colab** | โœ… Compatible | Open in Colab | | **AWS/GCP/Azure** | ๐Ÿ”ง Docker | Use Docker deployment | | **Heroku** | โš ๏ธ Limited | GPU not available | ## ๐Ÿงช Testing & Development ### ๐Ÿ” Running Tests ```bash # Basic functionality test python -c " from models.ocr_processor import OCRProcessor ocr = OCRProcessor() print(f'โœ… Model loaded: {ocr.get_model_info()}') " # Test with sample image python -c " from PIL import Image from models.ocr_processor import OCRProcessor import requests # Download test image img_url = 'https://via.placeholder.com/300x100/000000/FFFFFF?text=Hello+World' image = Image.open(requests.get(img_url, stream=True).raw) # Test OCR ocr = OCRProcessor() result = ocr.extract_text(image) print(f'โœ… OCR Result: {result}') " ``` ### ๐Ÿ› ๏ธ Development Tools ```bash # Install development dependencies pip install -r requirements.txt # Format code black . --line-length 88 # Type checking mypy models/ utils/ ui/ # Lint code flake8 --max-line-length 88 ``` ## ๐Ÿ“š API Reference ### OCRProcessor Class ```python from models.ocr_processor import OCRProcessor # Initialize processor ocr = OCRProcessor( model_name="microsoft/Florence-2-base", # Model selection device=None, # Auto-detect device torch_dtype=None # Auto-select dtype ) # Extract text from image text = ocr.extract_text(image) # Returns: str # Extract text with bounding boxes result = ocr.extract_text_with_regions(image) # Returns: dict with text and regions # Get model information info = ocr.get_model_info() # Returns: dict with model details # Cleanup resources ocr.cleanup() ``` ### Health Check API ```bash # Check application health curl https://huggingface.co/spaces/GoConqurer/textlens-ocr/health # Response: { "status": "healthy", "timestamp": 1640995200, "version": "1.0.0", "environment": "production" } # Check readiness curl https://huggingface.co/spaces/GoConqurer/textlens-ocr/ready # Response: { "status": "ready", "timestamp": 1640995200 } ``` ## ๐Ÿšจ Troubleshooting ### Common Issues | Issue | Symptoms | Solution | | ----------------------- | ------------------------ | --------------------------------------- | | **Model Loading Error** | ImportError, CUDA errors | Check GPU drivers, install CUDA toolkit | | **Memory Error** | Out of memory | Reduce batch size, use CPU inference | | **SSL Certificate** | SSL errors on macOS | Run certificate update command | | **Permission Error** | File access denied | Check file permissions, run as admin | ### Debug Commands ```bash # Check CUDA availability python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')" # Check transformers version python -c "import transformers; print(f'Transformers: {transformers.__version__}')" # Test health endpoint locally curl http://localhost:7860/health # View application logs tail -f textlens.log ``` ### Getting Help 1. ๐Ÿ“‹ **Check existing issues**: [GitHub Issues](https://github.com/KumarAmrit30/textlens-ocr/issues) 2. ๐Ÿ†• **Create new issue**: Provide error details and environment info 3. ๐Ÿ’ฌ **Join discussion**: [GitHub Discussions](https://github.com/KumarAmrit30/textlens-ocr/discussions) 4. ๐Ÿ“ง **Contact**: Create an issue for direct support ## ๐Ÿค Contributing We welcome contributions! Here's how to get started: ### ๐Ÿ”ง Development Setup 1. **Fork & Clone** ```bash git clone https://github.com/YOUR_USERNAME/textlens-ocr.git cd textlens-ocr ``` 2. **Create Branch** ```bash git checkout -b feature/your-feature-name ``` 3. **Make Changes** - Add new features or fix bugs - Update tests and documentation - Follow code style guidelines 4. **Test Changes** ```bash python -m pytest tests/ python -c "from models.ocr_processor import OCRProcessor; OCRProcessor()" ``` 5. **Submit PR** ```bash git add . git commit -m "feat: add your feature description" git push origin feature/your-feature-name ``` ### ๐Ÿ“ Contribution Guidelines - **Code Style**: Follow PEP 8, use Black formatter - **Documentation**: Update README and docstrings - **Tests**: Add tests for new functionality - **Commits**: Use conventional commit messages - **Issues**: Link PRs to relevant issues ## ๐Ÿ“„ License This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details. ### ๐Ÿ™ Third-Party Licenses - **Microsoft Florence-2**: [MIT License](https://github.com/microsoft/Florence) - **HuggingFace Transformers**: [Apache License 2.0](https://github.com/huggingface/transformers) - **Gradio**: [Apache License 2.0](https://github.com/gradio-app/gradio) - **EasyOCR**: [Apache License 2.0](https://github.com/JaidedAI/EasyOCR) ## ๐ŸŒŸ Acknowledgments Special thanks to: - **Microsoft Research** for the incredible Florence-2 vision-language model - **HuggingFace** for the transformers library and Spaces platform - **Gradio Team** for the amazing web interface framework - **JaidedAI** for EasyOCR fallback capabilities - **Open Source Community** for continuous support and contributions ## ๐Ÿ“ˆ Project Status | Component | Status | Version | | ----------------- | ------------- | ------- | | **Core OCR** | โœ… Stable | v1.0.0 | | **Web UI** | โœ… Stable | v1.0.0 | | **Deployment** | โœ… Production | v1.0.0 | | **API** | โœ… Stable | v1.0.0 | | **Documentation** | โœ… Complete | v1.0.0 | ### ๐ŸŽฏ Roadmap - [ ] **Multi-language UI** support - [ ] **Batch processing** for multiple images - [ ] **API rate limiting** and authentication - [ ] **Custom model** fine-tuning support - [ ] **Mobile app** development - [ ] **Cloud storage** integration ## ๐Ÿ“ž Support & Community ### ๐Ÿ”— Links - **๐Ÿ  Homepage**: [GitHub Repository](https://github.com/KumarAmrit30/textlens-ocr) - **๐Ÿš€ Live Demo**: [HuggingFace Spaces](https://huggingface.co/spaces/GoConqurer/textlens-ocr) - **๐Ÿ“‹ Issues**: [Report Bugs](https://github.com/KumarAmrit30/textlens-ocr/issues) - **๐Ÿ’ฌ Discussions**: [GitHub Discussions](https://github.com/KumarAmrit30/textlens-ocr/discussions) - **๐Ÿ“– Documentation**: [Deployment Guide](DEPLOYMENT.md) ### ๐Ÿ“Š Stats ![GitHub stars](https://img.shields.io/github/stars/KumarAmrit30/textlens-ocr?style=social) ![GitHub forks](https://img.shields.io/github/forks/KumarAmrit30/textlens-ocr?style=social) ![GitHub watchers](https://img.shields.io/github/watchers/KumarAmrit30/textlens-ocr?style=social) ---
**Made with โค๏ธ for the AI community** [โญ Star this repo](https://github.com/KumarAmrit30/textlens-ocr) โ€ข [๐Ÿ”— Try the demo](https://huggingface.co/spaces/GoConqurer/textlens-ocr) โ€ข [๐Ÿ“– Read docs](DEPLOYMENT.md)