File size: 2,579 Bytes
feaa2a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
851f8c3
feaa2a8
 
 
 
 
 
 
 
 
 
 
 
851f8c3
 
feaa2a8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---

title: README
emoji: 📄
colorFrom: gray
colorTo: red
sdk: static
pinned: false
license: cc-by-nc-sa-4.0
---


<p align="center">
  <img src="https://custom-images.strikinglycdn.com/res/hrscywv4p/image/upload/c_limit,fl_lossy,h_9000,w_1200,f_auto,q_auto/1392442/315225_240194.png" 
       alt="OdiaOCR Logo" width="800"/>
</p>


## About

This initiative builds on the **Odia Lipi** — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages. 

The goal is to host open **OCR datasets, models, tools, and benchmarks** that empower researchers, developers, linguists, and archivists to **extract machine‑readable text** from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications.

---

## Vision

To build **robust, open, and community‑driven Odia OCR datasets and models** that can accurately recognize both **printed and handwritten Odia script**, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows. 

---

## Problem Statement

Odia, like many other Indic languages, is **underserved by existing OCR systems**, which struggle with:
- Complex ligatures and diacritics in Odia script  
- Limited high‑quality annotated OCR datasets  
- Lack of reliable handwritten text recognition  
- Inadequate open‑source OCR models for Indic scripts  
Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing. 

---

## What We Work On

- **Odia and Indic OCR Dataset Creation & Curation**  
- **OCR Model Training & Evaluation** (Printed + Handwritten)  
- **OCR Annotation Tools & Workflows**  
- **Benchmarks & Quality Metrics**  
- **Integration with Multimodal NLP and Language Models** (text + image)  

This project aims to make Odia text **searchable, editable, and machine‑interpretable**, enabling downstream language technologies such as translation, summarization, and speech‑to‑text. 

---

## How to Contribute

We welcome contributions from researchers, students, linguists, and developers for:

- Dataset annotation and quality verification  
- Model training and evaluation  
- Benchmark creation  
- Tool development for OCR preprocessing and postprocessing

Feel free to open issues, share data sources, or propose collaborations.

---

🧩 **Visit the org page:** https://huggingface.co/OdiaGenAIOCR