Spaces:

OdiaGenAIOCR
/

README

Running

App Files Files Community

shantipriya commited on Jan 10

Commit

feaa2a8

verified ·

1 Parent(s): 851f8c3

Update README.md

Browse files

Files changed (1) hide show

README.md +64 -7

README.md CHANGED Viewed

@@ -1,10 +1,67 @@
 ---
-title: README
-emoji: 🌖
-colorFrom: green
-colorTo: red
-sdk: static
-pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

+---
+title: README
+emoji: 📄
+colorFrom: gray
+colorTo: red
+sdk: static
+pinned: false
+license: cc-by-nc-sa-4.0
+---
+<p align="center">
+  <img src="https://custom-images.strikinglycdn.com/res/hrscywv4p/image/upload/c_limit,fl_lossy,h_9000,w_1200,f_auto,q_auto/1392442/315225_240194.png"
+       alt="OdiaOCR Logo" width="800"/>
+</p>
+## About
+This initiative builds on the **Odia Lipi** — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages.
+The goal is to host open **OCR datasets, models, tools, and benchmarks** that empower researchers, developers, linguists, and archivists to **extract machine‑readable text** from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications.
+---
+## Vision
+To build **robust, open, and community‑driven Odia OCR datasets and models** that can accurately recognize both **printed and handwritten Odia script**, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows.
+---
+## Problem Statement
+Odia, like many other Indic languages, is **underserved by existing OCR systems**, which struggle with:
+- Complex ligatures and diacritics in Odia script
+- Limited high‑quality annotated OCR datasets
+- Lack of reliable handwritten text recognition
+- Inadequate open‑source OCR models for Indic scripts
+Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing.
+---
+## What We Work On
+- **Odia and Indic OCR Dataset Creation & Curation**
+- **OCR Model Training & Evaluation** (Printed + Handwritten)
+- **OCR Annotation Tools & Workflows**
+- **Benchmarks & Quality Metrics**
+- **Integration with Multimodal NLP and Language Models** (text + image)
+This project aims to make Odia text **searchable, editable, and machine‑interpretable**, enabling downstream language technologies such as translation, summarization, and speech‑to‑text.
 ---
+## How to Contribute
+We welcome contributions from researchers, students, linguists, and developers for:
+- Dataset annotation and quality verification
+- Model training and evaluation
+- Benchmark creation
+- Tool development for OCR preprocessing and postprocessing
+Feel free to open issues, share data sources, or propose collaborations.
 ---
+🧩 **Visit the org page:** https://huggingface.co/OdiaGenAIOCR