README / README.md
shantipriya's picture
Update README.md
feaa2a8 verified
---
title: README
emoji: 📄
colorFrom: gray
colorTo: red
sdk: static
pinned: false
license: cc-by-nc-sa-4.0
---
<p align="center">
<img src="https://custom-images.strikinglycdn.com/res/hrscywv4p/image/upload/c_limit,fl_lossy,h_9000,w_1200,f_auto,q_auto/1392442/315225_240194.png"
alt="OdiaOCR Logo" width="800"/>
</p>
## About
This initiative builds on the **Odia Lipi** — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages.
The goal is to host open **OCR datasets, models, tools, and benchmarks** that empower researchers, developers, linguists, and archivists to **extract machine‑readable text** from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications.
---
## Vision
To build **robust, open, and community‑driven Odia OCR datasets and models** that can accurately recognize both **printed and handwritten Odia script**, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows.
---
## Problem Statement
Odia, like many other Indic languages, is **underserved by existing OCR systems**, which struggle with:
- Complex ligatures and diacritics in Odia script
- Limited high‑quality annotated OCR datasets
- Lack of reliable handwritten text recognition
- Inadequate open‑source OCR models for Indic scripts
Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing.
---
## What We Work On
- **Odia and Indic OCR Dataset Creation & Curation**
- **OCR Model Training & Evaluation** (Printed + Handwritten)
- **OCR Annotation Tools & Workflows**
- **Benchmarks & Quality Metrics**
- **Integration with Multimodal NLP and Language Models** (text + image)
This project aims to make Odia text **searchable, editable, and machine‑interpretable**, enabling downstream language technologies such as translation, summarization, and speech‑to‑text.
---
## How to Contribute
We welcome contributions from researchers, students, linguists, and developers for:
- Dataset annotation and quality verification
- Model training and evaluation
- Benchmark creation
- Tool development for OCR preprocessing and postprocessing
Feel free to open issues, share data sources, or propose collaborations.
---
🧩 **Visit the org page:** https://huggingface.co/OdiaGenAIOCR