--- title: README emoji: 📄 colorFrom: gray colorTo: red sdk: static pinned: false license: cc-by-nc-sa-4.0 ---

OdiaOCR Logo

## About This initiative builds on the **Odia Lipi** — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages. The goal is to host open **OCR datasets, models, tools, and benchmarks** that empower researchers, developers, linguists, and archivists to **extract machine‑readable text** from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications. --- ## Vision To build **robust, open, and community‑driven Odia OCR datasets and models** that can accurately recognize both **printed and handwritten Odia script**, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows. --- ## Problem Statement Odia, like many other Indic languages, is **underserved by existing OCR systems**, which struggle with: - Complex ligatures and diacritics in Odia script - Limited high‑quality annotated OCR datasets - Lack of reliable handwritten text recognition - Inadequate open‑source OCR models for Indic scripts Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing. --- ## What We Work On - **Odia and Indic OCR Dataset Creation & Curation** - **OCR Model Training & Evaluation** (Printed + Handwritten) - **OCR Annotation Tools & Workflows** - **Benchmarks & Quality Metrics** - **Integration with Multimodal NLP and Language Models** (text + image) This project aims to make Odia text **searchable, editable, and machine‑interpretable**, enabling downstream language technologies such as translation, summarization, and speech‑to‑text. --- ## How to Contribute We welcome contributions from researchers, students, linguists, and developers for: - Dataset annotation and quality verification - Model training and evaluation - Benchmark creation - Tool development for OCR preprocessing and postprocessing Feel free to open issues, share data sources, or propose collaborations. --- 🧩 **Visit the org page:** https://huggingface.co/OdiaGenAIOCR