Spaces:
Running
Running
| title: README | |
| emoji: 📄 | |
| colorFrom: gray | |
| colorTo: red | |
| sdk: static | |
| pinned: false | |
| license: cc-by-nc-sa-4.0 | |
| <p align="center"> | |
| <img src="https://custom-images.strikinglycdn.com/res/hrscywv4p/image/upload/c_limit,fl_lossy,h_9000,w_1200,f_auto,q_auto/1392442/315225_240194.png" | |
| alt="OdiaOCR Logo" width="800"/> | |
| </p> | |
| ## About | |
| This initiative builds on the **Odia Lipi** — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages. | |
| The goal is to host open **OCR datasets, models, tools, and benchmarks** that empower researchers, developers, linguists, and archivists to **extract machine‑readable text** from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications. | |
| --- | |
| ## Vision | |
| To build **robust, open, and community‑driven Odia OCR datasets and models** that can accurately recognize both **printed and handwritten Odia script**, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows. | |
| --- | |
| ## Problem Statement | |
| Odia, like many other Indic languages, is **underserved by existing OCR systems**, which struggle with: | |
| - Complex ligatures and diacritics in Odia script | |
| - Limited high‑quality annotated OCR datasets | |
| - Lack of reliable handwritten text recognition | |
| - Inadequate open‑source OCR models for Indic scripts | |
| Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing. | |
| --- | |
| ## What We Work On | |
| - **Odia and Indic OCR Dataset Creation & Curation** | |
| - **OCR Model Training & Evaluation** (Printed + Handwritten) | |
| - **OCR Annotation Tools & Workflows** | |
| - **Benchmarks & Quality Metrics** | |
| - **Integration with Multimodal NLP and Language Models** (text + image) | |
| This project aims to make Odia text **searchable, editable, and machine‑interpretable**, enabling downstream language technologies such as translation, summarization, and speech‑to‑text. | |
| --- | |
| ## How to Contribute | |
| We welcome contributions from researchers, students, linguists, and developers for: | |
| - Dataset annotation and quality verification | |
| - Model training and evaluation | |
| - Benchmark creation | |
| - Tool development for OCR preprocessing and postprocessing | |
| Feel free to open issues, share data sources, or propose collaborations. | |
| --- | |
| 🧩 **Visit the org page:** https://huggingface.co/OdiaGenAIOCR |