Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: README
|
| 3 |
+
emoji: 📄
|
| 4 |
+
colorFrom: gray
|
| 5 |
+
colorTo: red
|
| 6 |
+
sdk: static
|
| 7 |
+
pinned: false
|
| 8 |
+
license: cc-by-nc-sa-4.0
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
<p align="center">
|
| 12 |
+
<img src="https://custom-images.strikinglycdn.com/res/hrscywv4p/image/upload/c_limit,fl_lossy,h_9000,w_1200,f_auto,q_auto/1392442/315225_240194.png"
|
| 13 |
+
alt="OdiaOCR Logo" width="800"/>
|
| 14 |
+
</p>
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
## About
|
| 18 |
+
|
| 19 |
+
This initiative builds on the **Odia Lipi** — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages.
|
| 20 |
+
|
| 21 |
+
The goal is to host open **OCR datasets, models, tools, and benchmarks** that empower researchers, developers, linguists, and archivists to **extract machine‑readable text** from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications.
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## Vision
|
| 26 |
+
|
| 27 |
+
To build **robust, open, and community‑driven Odia OCR datasets and models** that can accurately recognize both **printed and handwritten Odia script**, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows.
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
## Problem Statement
|
| 32 |
+
|
| 33 |
+
Odia, like many other Indic languages, is **underserved by existing OCR systems**, which struggle with:
|
| 34 |
+
- Complex ligatures and diacritics in Odia script
|
| 35 |
+
- Limited high‑quality annotated OCR datasets
|
| 36 |
+
- Lack of reliable handwritten text recognition
|
| 37 |
+
- Inadequate open‑source OCR models for Indic scripts
|
| 38 |
+
Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing.
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## What We Work On
|
| 43 |
+
|
| 44 |
+
- **Odia and Indic OCR Dataset Creation & Curation**
|
| 45 |
+
- **OCR Model Training & Evaluation** (Printed + Handwritten)
|
| 46 |
+
- **OCR Annotation Tools & Workflows**
|
| 47 |
+
- **Benchmarks & Quality Metrics**
|
| 48 |
+
- **Integration with Multimodal NLP and Language Models** (text + image)
|
| 49 |
+
|
| 50 |
+
This project aims to make Odia text **searchable, editable, and machine‑interpretable**, enabling downstream language technologies such as translation, summarization, and speech‑to‑text.
|
| 51 |
+
|
| 52 |
---
|
| 53 |
+
|
| 54 |
+
## How to Contribute
|
| 55 |
+
|
| 56 |
+
We welcome contributions from researchers, students, linguists, and developers for:
|
| 57 |
+
|
| 58 |
+
- Dataset annotation and quality verification
|
| 59 |
+
- Model training and evaluation
|
| 60 |
+
- Benchmark creation
|
| 61 |
+
- Tool development for OCR preprocessing and postprocessing
|
| 62 |
+
|
| 63 |
+
Feel free to open issues, share data sources, or propose collaborations.
|
| 64 |
+
|
| 65 |
---
|
| 66 |
|
| 67 |
+
🧩 **Visit the org page:** https://huggingface.co/OdiaGenAIOCR
|