shantipriya commited on
Commit
feaa2a8
·
verified ·
1 Parent(s): 851f8c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -7
README.md CHANGED
@@ -1,10 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: README
3
- emoji: 🌖
4
- colorFrom: green
5
- colorTo: red
6
- sdk: static
7
- pinned: false
 
 
 
 
 
 
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
1
+ ---
2
+ title: README
3
+ emoji: 📄
4
+ colorFrom: gray
5
+ colorTo: red
6
+ sdk: static
7
+ pinned: false
8
+ license: cc-by-nc-sa-4.0
9
+ ---
10
+
11
+ <p align="center">
12
+ <img src="https://custom-images.strikinglycdn.com/res/hrscywv4p/image/upload/c_limit,fl_lossy,h_9000,w_1200,f_auto,q_auto/1392442/315225_240194.png"
13
+ alt="OdiaOCR Logo" width="800"/>
14
+ </p>
15
+
16
+
17
+ ## About
18
+
19
+ This initiative builds on the **Odia Lipi** — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages.
20
+
21
+ The goal is to host open **OCR datasets, models, tools, and benchmarks** that empower researchers, developers, linguists, and archivists to **extract machine‑readable text** from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications.
22
+
23
+ ---
24
+
25
+ ## Vision
26
+
27
+ To build **robust, open, and community‑driven Odia OCR datasets and models** that can accurately recognize both **printed and handwritten Odia script**, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows.
28
+
29
+ ---
30
+
31
+ ## Problem Statement
32
+
33
+ Odia, like many other Indic languages, is **underserved by existing OCR systems**, which struggle with:
34
+ - Complex ligatures and diacritics in Odia script
35
+ - Limited high‑quality annotated OCR datasets
36
+ - Lack of reliable handwritten text recognition
37
+ - Inadequate open‑source OCR models for Indic scripts
38
+ Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing.
39
+
40
+ ---
41
+
42
+ ## What We Work On
43
+
44
+ - **Odia and Indic OCR Dataset Creation & Curation**
45
+ - **OCR Model Training & Evaluation** (Printed + Handwritten)
46
+ - **OCR Annotation Tools & Workflows**
47
+ - **Benchmarks & Quality Metrics**
48
+ - **Integration with Multimodal NLP and Language Models** (text + image)
49
+
50
+ This project aims to make Odia text **searchable, editable, and machine‑interpretable**, enabling downstream language technologies such as translation, summarization, and speech‑to‑text.
51
+
52
  ---
53
+
54
+ ## How to Contribute
55
+
56
+ We welcome contributions from researchers, students, linguists, and developers for:
57
+
58
+ - Dataset annotation and quality verification
59
+ - Model training and evaluation
60
+ - Benchmark creation
61
+ - Tool development for OCR preprocessing and postprocessing
62
+
63
+ Feel free to open issues, share data sources, or propose collaborations.
64
+
65
  ---
66
 
67
+ 🧩 **Visit the org page:** https://huggingface.co/OdiaGenAIOCR