Spaces:
Sleeping
title: OmniFile AI Processor
emoji: ๐ง
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
๐ง OmniFile AI Processor v4.3.0
ูุธุงู ุฐูุงุก ุงุตุทูุงุนู ู ุชูุงู ู ูู ุนุงูุฌุฉ ุงูู ููุงุช ูุงููุตูุต ูุงูุฎุท ุงููุฏูู A Comprehensive AI System for File Processing, Text Analysis & Handwriting Recognition
Version: v4.3.0 | Status: โ CI-Verified
๐ Live Demo (HF Spaces) | ๐ Documentation | ๐ Report Bug | ๐ก Suggestions
๐จโ๐ป About the Author | ุนู ุงูู ุคูู
|
Dr Abdulmalek Tamer Al-husseini
๐ Location: Homs, Syria |
๐ Description | ุงููุตู
OmniFile AI Processor is a production-ready, multimodal AI system that integrates six projects into a unified platform for document intelligence:
OmniFile_Processor + HandwrittenOCR + handwriting-ocr + arabic-ocr-pro + advanced-ocr + OCR-Enhancer
ูุธุงู ุฐูุงุก ุงุตุทูุงุนู ู ุชูุฏู ูุฌู ุน ุณุชุฉ ู ุดุงุฑูุน ูู ู ูุตุฉ ูุงุญุฏุฉ ูู ุนุงูุฌุฉ ุงูู ููุงุช ูุงูุฎุท ุงููุฏูู. ูุฏุนู ุงูุนุฑุจูุฉ ูุงูุฅูุฌููุฒูุฉ ูุงูุฃูู ุงููุฉ ู ุน ูุญุฏุงุช ู ุชุฎุตุตุฉ ููุฑุคูุฉ ุงูุญุงุณูุจูุฉ ูู ุนุงูุฌุฉ ุงููุบุฉ ูุงูุฃู ุงู ูุงูุชุตุฏูุฑ.
โจ Features | ุงูู ู ูุฒุงุช
๐ Computer Vision & OCR (ูุญุฏุฉ ุงูุฑุคูุฉ ุงูุญุงุณูุจูุฉ)
- Multi-Engine OCR โ 4 engines (TrOCR, EasyOCR, Tesseract, PaddleOCR) with intelligent engine selection
- Result Fusion โ 4 strategies: highest confidence, weighted average, voting, longest text
- Advanced Preprocessing โ CLAHE, deskew, denoise, Otsu thresholding, ONNX Runtime acceleration
- Layout Analysis โ Automatic detection of tables, headers, footers, and document structure
- Table Extraction โ Hough line detection + contour analysis for structured data extraction
๐ฃ๏ธ Natural Language Processing (ูุญุฏุฉ ู ุนุงูุฌุฉ ุงููุบุฉ)
- Multilingual Spell Correction โ Arabic, English, German with user-learning capability (186+ Arabic corrections)
- RTL Text Processing โ Full Arabic reshaping + BiDi support with 40+ normalization mappings
- Mixed-Text Handling โ Arabic/English/numbers with medical term protection
- Translation Engine โ Helsinki-NLP/opus-mt supporting 6 language pairs
- AI Summarization โ BART (facebook/bart-large-cnn) + Arabic (UAE-Code/mbart-summarization-ar)
- Entity Extraction & Text Classification โ BERT-based NER with 6-category classification
๐ค AI Enhancement (ูุญุฏุฉ ุงูุฐูุงุก ุงูุงุตุทูุงุนู)
- GPT & Gemini Refinement โ Context-aware OCR correction with block-type-specific prompts
- SSIM Pattern Matching โ Self-learning from corrected word images with SQLite pattern database
๐ค Multi-Format Export (ูุญุฏุฉ ุงูุชุตุฏูุฑ)
- 6 Export Formats โ DOCX (RTL support), HTML, searchable PDF, Excel, JSON (with BBox), TXT (UTF-8 BOM)
๐ Security & Privacy (ูุญุฏุฉ ุงูุฃู ุงู)
- PII Detection โ Presidio-based sensitive data scanning + detect-secrets
- File Encryption โ Fernet (AES-128) with folder support
- Code Protection โ Prevents spell correction inside code blocks
- Audit Logging โ File + Redis audit trail with rate limiting (slowapi + Nginx)
๐ Evaluation (ูุญุฏุฉ ุงูุชูููู )
- CER/WER Metrics โ OCR accuracy evaluation with Arabic normalization + Levenshtein distance
- Quality Grading โ A+ to F with actionable recommendations
๐ฅ๏ธ Multiple Interfaces (ูุงุฌูุงุช ุงูู ุณุชุฎุฏู )
- 4 UIs โ Streamlit (6 tabs), Gradio (7 tabs), React + shadcn/ui (dark/light), CLI, PyQt6 desktop
- FastAPI Backend โ Full REST API with Swagger documentation
๐ Scalability & Deployment (ุงูุชุญุฌูู ูุงููุดุฑ)
- Docker + Compose โ One-command deployment with all services
- Kubernetes Ready โ Complete K8s manifests with HPA (2-10 pods auto-scaling)
- Celery + Redis โ Asynchronous task processing for heavy workloads
๐ Quick Start | ุงูุชุดุบูู ุงูุณุฑูุน
Option 1: HuggingFace Spaces (Recommended for Demo)
The project is deployed and available at: ๐ https://huggingface.co/spaces/DrAbdulmalek/handwriting-ocr
To deploy your own instance:
git clone https://github.com/DrAbdulmalek/OmniFile_Processor.git
cd OmniFile_Processor
pip install -r requirements-hf.txt
python -m src.gradio_ui
Option 2: Local Installation (Linux / macOS / Windows)
# Clone the repository
git clone https://github.com/DrAbdulmalek/OmniFile_Processor.git
cd OmniFile_Processor
# Install dependencies
pip install -r requirements-full.txt # Everything (~6-8 GB, ~15-30 min)
# Or install in layers:
# pip install -r requirements-core.txt # Minimum (~1.5 GB, ~3 min)
# pip install -r requirements-core.txt -r requirements-ocr.txt # + OCR engines
# pip install -r requirements-core.txt -r requirements-nlp.txt # + NLP
# Run with your preferred interface
streamlit run app.py # Streamlit UI (6 tabs)
python -m src.gradio_ui # Gradio UI (7 tabs)
python main.py # CLI interface
cd frontend && npm install && npm run dev # React Frontend
Option 3: Docker Compose (Full Stack)
git clone https://github.com/DrAbdulmalek/OmniFile_Processor.git
cd OmniFile_Processor
docker-compose up -d
# Access:
# API Docs: http://localhost:5001/docs
# Streamlit: http://localhost:7860
# React: http://localhost:3000
# Nginx Proxy: http://localhost
Option 4: Google Colab
!git clone https://github.com/DrAbdulmalek/OmniFile_Processor.git
%cd OmniFile_Processor
!pip install -r requirements-full.txt # Everything (~6-8 GB, ~15-30 min)
# Or install in layers:
# pip install -r requirements-core.txt # Minimum (~1.5 GB, ~3 min)
# pip install -r requirements-core.txt -r requirements-ocr.txt # + OCR engines
# pip install -r requirements-core.txt -r requirements-nlp.txt # + NLP
!streamlit run app.py --server.port 7860
๐ Project Structure | ูููู ุงูู ุดุฑูุน
ู ูุงุญุธุฉ ู ุนู ุงุฑูุฉ โ Architecture Note: ููุฌุฏ ูู ุงูู ุดุฑูุน ูุธุงู ุงู ู ุชูุงุฒูุงู:
modules/โ ุงูุจููุฉ ุงููุธุฑูุฉ ุงูู ูุณูุนุฉ: ูุญุฏุงุช ู ูุธูู ุฉ ุจูุถูุญ (vision, nlp, security, export, ai, evaluation) ู ุน ูู ุงุฐุฌ Pydantic v2. ูุฐู ูู ุงูุจููุฉ ุงูู ุณุชูุจููุฉ ุงูู ูุตูุฏุฉ ููู ุดุฑูุน.src/โ ู ุญุฑู HandwrittenOCR ุงูุนู ูู: ูุญุชูู ุนูู ุงูุชุทุจูู ุงููุนูู ุงูู ูุณุชุฎุฏูู ูู ูุงุฌูุฉ Gradio (src/gradio_ui.py) ูHF Spaces. ูุดู ู TrOCR Batch, LoRA Fine-tuning, Active Learning, ูStudy Guide.- ุงูู ููุงุช ุงูุฌุฐุฑูุฉ (
app.py,database.py,config.py) โ ุทุจูุฉ ุงูุชูุงู ู ุงูุชู ุชุฑุจุท ุจูู ุงูุจููุฉ ูุงูู ุญุฑูุงุช.ุงูุฎูุงุฑ ุงูู ุชุจููู ุญุงููุงู:
src/ูู ุงูููุฏ ุงูุนู ูู ุงููุนูุงู ููุงุฌูุฉ Gradio ูHF Spacesุ ุจููู ุงmodules/ูู ุซู ุงูุจููุฉ ุงููุธุฑูุฉ ุงูู ูุธู ุฉ ููู ุดุฑูุน ุงูู ูุณูุน. ุงูุชุญููู ุงูุชุฏุฑูุฌู (migration) ู ูsrc/ุฅููmodules/ุณูุชู ุนูู ู ุฑุงุญู ุนุจุฑ Pull Requests ู ุณุชููุฉ.
OmniFile_Processor/
โโโ app.py # Main Streamlit UI
โโโ config.py # Central configuration v4.1.1
โโโ database.py # SQLite database layer
โโโ main.py # Local / CLI entry point
โโโ tasks.py # Celery async tasks
โโโ requirements.txt # Full dependencies (legacy)
โโโ requirements-core.txt # Core only (~1.5 GB)
โโโ requirements-ocr.txt # OCR engines layer
โโโ requirements-nlp.txt # NLP layer
โโโ requirements-full.txt # Everything (~6-8 GB)
โโโ requirements-hf.txt # HuggingFace Spaces (minimal)
โโโ Dockerfile # Docker image
โโโ docker-compose.yml # Full stack orchestration
โโโ nginx.conf # Nginx load balancer
โโโ LICENSE # MIT License
โ
โโโ modules/
โ โโโ core/ # Core data models
โ โ โโโ structure.py # Pydantic v2 models
โ โ
โ โโโ vision/ # Computer Vision & OCR
โ โ โโโ ocr_engine.py # 4 OCR engines + ONNX + Quantization
โ โ โโโ image_preprocessor.py # CLAHE + Denoise + Deskew + Otsu
โ โ โโโ pdf_processor.py # Multi-format PDF processing
โ โ โโโ text_reconstructor.py # RTL/LTR sentence reconstruction
โ โ โโโ result_fusion.py # 4 fusion strategies
โ โ โโโ layout_analyzer.py # Layout analysis (tables, headers)
โ โ โโโ table_extractor.py # Table extraction (Hough + contours)
โ โ
โ โโโ nlp/ # Natural Language Processing
โ โ โโโ spell_corrector.py # 3-language correction + learning
โ โ โโโ translator.py # Helsinki-NLP translation
โ โ โโโ summarizer.py # BART summarization
โ โ โโโ entity_extractor.py # BERT-based NER
โ โ โโโ language_detector.py # Language detection
โ โ โโโ text_classifier.py # 6-category classification
โ โ โโโ arabic_rtl.py # Full RTL processing
โ โ โโโ mixed_text.py # Arabic/English mixed text
โ โ โโโ ai_corrector.py # GPT-based correction
โ โ โโโ arabic_nlp_utils.py # Semantic similarity for Arabic OCR
โ โ โโโ correction_dict.json # 186+ Arabic corrections
โ โ
โ โโโ ai/ # AI Enhancement
โ โ โโโ pattern_matcher.py # SSIM pattern matching
โ โ โโโ pattern_db.py # SQLite pattern database
โ โ โโโ gemini_refiner.py # Gemini AI refinement
โ โ
โ โโโ security/ # Security & Privacy
โ โ โโโ file_scanner.py # Security scanning
โ โ โโโ sensitive_data_scanner.py # PII detection (Presidio)
โ โ โโโ encryption.py # Fernet encryption (AES-128)
โ โ โโโ code_protector.py # Code block protection
โ โ โโโ file_organizer.py # Auto file organization
โ โ โโโ archive_handler.py # Archive management
โ โ โโโ backup_manager.py # Backup management
โ โ โโโ audit_logger.py # Audit logging
โ โ โโโ secure_file_handler.py # Safe file handling
โ โ
โ โโโ export/ # Multi-Format Export
โ โ โโโ exporter.py # DOCX/HTML/PDF/JSON/TXT/Excel
โ โ โโโ layout_preserving.py # DOCX export with visual layout preservation
โ โ
โ โโโ evaluation/ # Evaluation & Metrics
โ โโโ metrics.py # CER/WER + quality grading
โ
โโโ frontend/ # React + shadcn/ui Web App
โ โโโ src/
โ โ โโโ App.jsx # Main application
โ โ โโโ components/ # UI components
โ โ โ โโโ FileUpload.jsx
โ โ โ โโโ ProcessingOptions.jsx
โ โ โ โโโ ResultsDisplay.jsx
โ โ โโโ services/api.js # API client
โ โโโ package.json
โ
โโโ backend/ # FastAPI Backend
โ โโโ main.py # REST API endpoints
โ
โโโ data/
โ โโโ arabic_fixes.json # 186 Arabic corrections
โโโ data_seed/
โ โโโ correction_dict_seed.json # Seed data
โโโ artifacts/
โ โโโ correction_dict.json # Learned corrections
โ
โโโ src/ # HandwrittenOCR Engine
โโโ mobile/ # Static PWA (offline review)
โโโ mobile_review/ # Flask server (remote team review)
โ โโโ server.py # REST API review server
โ โโโ templates/review.html # Touch-friendly review UI
โ โโโ README.md # mobile/ vs mobile_review/ guide
โโโ tests/ # pytest test suite (13 files)
โโโ .github/workflows/ # CI/CD
โ โโโ ci.yml # Tests on push/PR
โ โโโ release.yml # Auto-release on tags
โโโ notebooks/ # Jupyter Notebooks
โ โโโ OmniFile_Gradio_Debugger.ipynb # Gradio interactive debugger (Colab-ready)
โโโ docs/ # Documentation
โ โโโ TESTING_GUIDE.md # Testing & development guide
โ โโโ API_DOCS.md
โ โโโ USER_GUIDE.md
โ โโโ DEVELOPER_GUIDE.md
โโโ k8s/ # Kubernetes manifests
โ โโโ namespace.yaml
โ โโโ backend.yaml
โ โโโ celery.yaml
โ โโโ redis.yaml
โ โโโ nginx.yaml
โ โโโ hpa.yaml
โ โโโ storage.yaml
โโโ examples/ # Usage examples
โโโ ocr_basic.py
โโโ nlp_pipeline.py
โโโ evaluation_example.py
๐งฉ Module Descriptions | ูุตู ุงููุญุฏุงุช
1. ๐ฏ modules/core/ โ Core Data Models
The foundational layer defining all shared data structures using Pydantic v2. Provides type-safe models for OCR results, processing options, document metadata, and inter-module communication.
| File | Description |
|---|---|
structure.py |
Pydantic v2 models for OCRResult, ProcessingOptions, Document, BoundingBox, and all shared types |
2. ๐๏ธ modules/vision/ โ Computer Vision & OCR Engine
The heart of the system. Handles image preprocessing, OCR with 4 engines, result fusion, PDF processing, layout analysis, table extraction, and text reconstruction.
| File | Description |
|---|---|
ocr_engine.py |
Multi-engine OCR (TrOCR, EasyOCR, Tesseract, PaddleOCR) with ONNX Runtime & INT8 quantization |
image_preprocessor.py |
CLAHE, Gaussian denoise, deskew (Hough), Otsu thresholding, adaptive binarization |
pdf_processor.py |
PyMuPDF-based PDF processing with pdf2image fallback |
text_reconstructor.py |
RTL/LTR sentence reconstruction with language-aware ordering |
result_fusion.py |
4 fusion strategies: highest confidence, weighted average, voting, longest text |
layout_analyzer.py |
Document layout analysis โ tables, headers, footers, sections |
table_extractor.py |
Table extraction using Hough lines + contour analysis |
3. ๐ฃ๏ธ modules/nlp/ โ Natural Language Processing
Multilingual text processing including spell correction, translation, summarization, entity extraction, text classification, and Arabic RTL handling.
| File | Description |
|---|---|
spell_corrector.py |
3-language spell correction (AR, EN, DE) with user learning & Python keyword protection |
translator.py |
Helsinki-NLP/opus-mt machine translation (6 language pairs) |
summarizer.py |
BART summarization (English + Arabic) |
entity_extractor.py |
BERT-based Named Entity Recognition |
language_detector.py |
Automatic language detection (AR, EN, DE) |
text_classifier.py |
6-category text classification |
arabic_rtl.py |
Full RTL processing โ arabic_reshaper + python-bidi + 40+ normalization rules |
mixed_text.py |
Arabic/English/number mixed text handler with medical term protection |
ai_corrector.py |
GPT-based OCR correction with context awareness |
correction_dict.json |
186+ common Arabic OCR error corrections |
4. ๐ modules/security/ โ Security & Privacy
Comprehensive security module for PII detection, encryption, code protection, file organization, backup management, and audit logging.
| File | Description |
|---|---|
file_scanner.py |
File security scanning and validation |
sensitive_data_scanner.py |
PII detection using Microsoft Presidio + detect-secrets |
encryption.py |
Fernet symmetric encryption (AES-128) with folder support |
code_protector.py |
Prevents spell correction inside code blocks (Python, JS, etc.) |
file_organizer.py |
Automatic file organization by type and content |
archive_handler.py |
ZIP archive creation/extraction with integrity checks |
backup_manager.py |
Automatic and manual backup management |
audit_logger.py |
File + Redis audit trail with statistics |
secure_file_handler.py |
Path traversal prevention + safe tempfile handling |
5. ๐ค modules/ai/ โ AI Enhancement
Advanced AI capabilities including self-learning pattern matching and Gemini-based refinement.
| File | Description |
|---|---|
pattern_matcher.py |
SSIM-based visual pattern matching โ learns from corrected word images |
pattern_db.py |
SQLite database for storing and retrieving visual OCR patterns |
gemini_refiner.py |
Google Gemini API integration for context-aware OCR refinement |
6. ๐ค modules/export/ โ Multi-Format Export
Export processed documents to 6 different formats with proper RTL support.
| File | Description |
|---|---|
exporter.py |
Export to DOCX (RTL bidi), HTML (dir="rtl"), searchable PDF (invisible text), Excel (RTL alignment), JSON (full structure with BBox), TXT (UTF-8 BOM) |
7. ๐ modules/evaluation/ โ Evaluation & Metrics
OCR accuracy evaluation with Arabic-aware normalization.
| File | Description |
|---|---|
metrics.py |
CER/WER computation, Arabic text normalization, Levenshtein distance (zero dependencies), quality grading (A+ to F) |
๐ API Documentation | ุชูุซูู API
The FastAPI backend exposes the following REST endpoints. Full interactive documentation is available at /docs (Swagger UI) when the backend is running.
Base URL
http://localhost:5001/api/v1
Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/ocr/process |
Process an image/PDF file with OCR |
POST |
/ocr/process-batch |
Batch process multiple files |
GET |
/ocr/result/{task_id} |
Get OCR result by task ID |
POST |
/nlp/correct |
Spell-correct text (AR/EN/DE) |
POST |
/nlp/translate |
Translate text between languages |
POST |
/nlp/summarize |
Summarize text |
POST |
/nlp/entities |
Extract named entities |
POST |
/nlp/classify |
Classify text into categories |
POST |
/export/{format} |
Export results to DOCX/HTML/PDF/JSON/TXT/Excel |
POST |
/security/scan |
Scan file for PII and security issues |
POST |
/security/encrypt |
Encrypt a file |
POST |
/security/decrypt |
Decrypt a file |
GET |
/health |
Health check endpoint |
GET |
/metrics |
System performance metrics |
๐ For the complete API reference with request/response schemas, see docs/API_DOCS.md or access the Swagger UI at
/docs.
๐ธ Screenshots | ููุทุงุช ุงูุดุงุดุฉ
๐ Screenshots will be added in a future update. For now, try the live demo to see the application in action.
๐ See CHANGELOG.md for the complete version history.
๐ Project Statistics | ุฅุญุตุงุฆูุงุช ุงูู ุดุฑูุน
| Metric | Value |
|---|---|
| Python Files | 72+ |
| Lines of Code | ~28,000 |
| Total Files | 152+ |
| OCR Engines | 4 (TrOCR, EasyOCR, Tesseract, PaddleOCR) |
| Fusion Strategies | 4 |
| Supported Languages | 3 (EN, AR, DE) |
| Export Formats | 6 (DOCX, HTML, PDF, JSON, TXT, Excel) |
| Test Files | 13 |
| Merged Projects | 6 |
| Security Modules | 9 |
| NLP Capabilities | 10 |
| API Endpoints | 14+ |
๐ Supported Languages | ุงููุบุงุช ุงูู ุฏุนูู ุฉ
| Language | Code | RTL Support | OCR | Spell Check | Translation |
|---|---|---|---|---|---|
| ๐ธ๐ฆ ุงูุนุฑุจูุฉ (Arabic) | ar |
โ | โ | โ | โ |
| ๐ฌ๐ง English | en |
โ | โ | โ | โ |
| ๐ฉ๐ช Deutsch (German) | de |
โ | โ | โ | โ |
๐ค Contributing | ุงูู ุณุงูู ุฉ
Contributions are welcome! Please follow these steps:
ููู ุชุณุงูู / How to Contribute
- Fork the repository
- Clone your fork locally
git clone https://github.com/your-username/OmniFile_Processor.git cd OmniFile_Processor - Create a feature branch
git checkout -b feature/your-feature-name - Make your changes and ensure tests pass
pip install -r requirements-full.txt # Everything (~6-8 GB, ~15-30 min)
Or install in layers:
pip install -r requirements-core.txt # Minimum (~1.5 GB, ~3 min)
pip install -r requirements-core.txt -r requirements-ocr.txt # + OCR engines
pip install -r requirements-core.txt -r requirements-nlp.txt # + NLP
pytest tests/ -v
5. **Commit** with a descriptive message
```bash
git commit -m "feat: add your feature description"
- Push to your fork
git push origin feature/your-feature-name - Open a Pull Request against the
mainbranch
Development Guidelines
- Follow PEP 8 style guidelines
- Add docstrings to all new functions and classes
- Write tests for new features (place in
tests/) - Update the relevant documentation in
docs/ - Use type hints throughout your code
- Ensure RTL handling is tested for any text-related changes
๐ License | ุงูุชุฑุฎูุต
This project is licensed under the MIT License.
MIT License
Copyright (c) 2026 Dr Abdulmalek Tamer Al-husseini โ Homs, Syria
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
See LICENSE for the full text.
๐ Links | ุงูุฑูุงุจุท
| Resource | Link |
|---|---|
| ๐ GitHub Repository | https://github.com/DrAbdulmalek/OmniFile_Processor |
| ๐ค HuggingFace Spaces | https://huggingface.co/spaces/DrAbdulmalek/handwriting-ocr |
| ๐ User Guide | docs/USER_GUIDE.md |
| ๐จโ๐ป Developer Guide | docs/DEVELOPER_GUIDE.md |
| ๐งช Testing Guide | docs/TESTING_GUIDE.md |
| ๐ก API Documentation | docs/API_DOCS.md |
| ๐ก Suggestions | SUGGESTIONS.md |
| ๐ License | LICENSE |
Built with โค๏ธ by Dr Abdulmalek Tamer Al-husseini ๐ Homs, Syria | ๐ง Abdulmalek.husseini@gmail.com
โญ If you find this project useful, please give it a star on GitHub!





