Spaces:

OdiaGenAIOCR
/

README

Running

App Files Files Community

README / README.md

shantipriya

Update README.md

feaa2a8 verified 5 days ago

preview code

raw

history blame contribute delete

2.58 kB

	---
	title: README
	emoji: 📄
	colorFrom: gray
	colorTo: red
	sdk: static
	pinned: false
	license: cc-by-nc-sa-4.0
	---

	<p align="center">
	<img src="https://custom-images.strikinglycdn.com/res/hrscywv4p/image/upload/c_limit,fl_lossy,h_9000,w_1200,f_auto,q_auto/1392442/315225_240194.png"
	alt="OdiaOCR Logo" width="800"/>
	</p>


	## About

	This initiative builds on the Odia Lipi — a focused effort to address the longstanding challenges of digitizing Odia text from images, scanned documents, palm leaves, manuscripts, newspapers, and handwritten pages.

	The goal is to host open OCR datasets, models, tools, and benchmarks that empower researchers, developers, linguists, and archivists to extract machine‑readable text from complex Indic scripts. This is essential for education, cultural preservation, digital accessibility, and downstream AI applications.

	---

	## Vision

	To build robust, open, and community‑driven Odia OCR datasets and models that can accurately recognize both printed and handwritten Odia script, overcoming limitations of existing OCR tools and making Odia text fully searchable, editable, and usable in modern AI workflows.

	---

	## Problem Statement

	Odia, like many other Indic languages, is underserved by existing OCR systems, which struggle with:
	- Complex ligatures and diacritics in Odia script
	- Limited high‑quality annotated OCR datasets
	- Lack of reliable handwritten text recognition
	- Inadequate open‑source OCR models for Indic scripts
	Without dedicated solutions, a significant portion of Odia content remains inaccessible for digital archiving and AI processing.

	---

	## What We Work On

	- Odia and Indic OCR Dataset Creation & Curation
	- OCR Model Training & Evaluation (Printed + Handwritten)
	- OCR Annotation Tools & Workflows
	- Benchmarks & Quality Metrics
	- Integration with Multimodal NLP and Language Models (text + image)

	This project aims to make Odia text searchable, editable, and machine‑interpretable, enabling downstream language technologies such as translation, summarization, and speech‑to‑text.

	---

	## How to Contribute

	We welcome contributions from researchers, students, linguists, and developers for:

	- Dataset annotation and quality verification
	- Model training and evaluation
	- Benchmark creation
	- Tool development for OCR preprocessing and postprocessing

	Feel free to open issues, share data sources, or propose collaborations.

	---

	🧩 Visit the org page: https://huggingface.co/OdiaGenAIOCR