---
title: README
emoji: 📄
colorFrom: gray
colorTo: red
sdk: static
pinned: false
license: cc-by-nc-sa-4.0
---
## About
This initiative builds on the **Odia Lipi** — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages.
The goal is to host open **OCR datasets, models, tools, and benchmarks** that empower researchers, developers, linguists, and archivists to **extract machine‑readable text** from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications.
---
## Vision
To build **robust, open, and community‑driven Odia OCR datasets and models** that can accurately recognize both **printed and handwritten Odia script**, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows.
---
## Problem Statement
Odia, like many other Indic languages, is **underserved by existing OCR systems**, which struggle with:
- Complex ligatures and diacritics in Odia script
- Limited high‑quality annotated OCR datasets
- Lack of reliable handwritten text recognition
- Inadequate open‑source OCR models for Indic scripts
Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing.
---
## What We Work On
- **Odia and Indic OCR Dataset Creation & Curation**
- **OCR Model Training & Evaluation** (Printed + Handwritten)
- **OCR Annotation Tools & Workflows**
- **Benchmarks & Quality Metrics**
- **Integration with Multimodal NLP and Language Models** (text + image)
This project aims to make Odia text **searchable, editable, and machine‑interpretable**, enabling downstream language technologies such as translation, summarization, and speech‑to‑text.
---
## How to Contribute
We welcome contributions from researchers, students, linguists, and developers for:
- Dataset annotation and quality verification
- Model training and evaluation
- Benchmark creation
- Tool development for OCR preprocessing and postprocessing
Feel free to open issues, share data sources, or propose collaborations.
---
🧩 **Visit the org page:** https://huggingface.co/OdiaGenAIOCR