README / README.md
shantipriya's picture
Update README.md
22345fc verified
---
title: README
emoji: 🏢
colorFrom: green
colorTo: pink
sdk: static
pinned: false
---
## About
OdiaGenAI is an open research initiative advancing Generative AI, LLMs, and multimodal technologies for Odia and low-resource Indic languages through community-driven, open-source collaboration.
---
## Vision
Empowering Odia and low-resource Indic languages through open, multimodal, and community-owned AI.
---
## Related Hugging Face Organizations
OdiaGenAI collaborates with and maintains close ties to other HF organizations that focus on Odia and Indic LLMs:
* **🔗 [OdiaGenAI](https://huggingface.co/OdiaGenAI)** – Main organization for Odia datasets, models, and AI tools (text, speech, OCR, multimodal).
https://huggingface.co/OdiaGenAI
* **🔗 [OdiaGenAI‑LLM](https://huggingface.co/OdiaGenAI‑LLM)** – Focused LLM organization with additional Odia and Indic‑centric model releases (e.g., Mistral, LLaMA variants).
https://huggingface.co/OdiaGenAI‑LLM
* **🔗 [odiagenmllm](https://huggingface.co/odiagenmllm)** – Organization hosting multilingual and Odia‑focused LLM projects, benchmarks, and community models.
https://huggingface.co/odiagenmllm
* **🔗 [OdiaGenAIdata](https://huggingface.co/OdiaGenAIdata)** – Dataset‑centric organization hosting large corpora for Odia pretraining and evaluation (if separate).
https://huggingface.co/OdiaGenAIdata
* **🔗 [OdiaGenAIOCR](https://huggingface.co/OdiaGenAIOCR)** – Organization dedicated to **Odia OCR datasets, models, and tools** for printed and handwritten text recognition.
https://huggingface.co/OdiaGenAIOCR
* **🔗 [Hindi‑data‑hub](https://huggingface.co/Hindi-data-hub)** – A community‑driven hub for **Hindi language datasets and models**, supporting Indic language research.
https://huggingface.co/Hindi-data-hub
* **🔗 [HydraIndicLM](https://huggingface.co/HydraIndicLM)** – An Indic LLM initiative focused on building and hosting language models and benchmarks for multiple Indic languages.
https://huggingface.co/HydraIndicLM
* **🔗 [ShopIntel](https://huggingface.co/ShopIntel)** – Organization oriented toward **multilingual models and industry‑focused AI research**, including support for Indic languages.
https://huggingface.co/ShopIntel
* **🔗 [Indic‑Benchmark](https://huggingface.co/Indic-Benchmark)** – Initiative providing **benchmarks and evaluation suites** for multiple Indic languages across NLP tasks.
https://huggingface.co/Indic-Benchmark
## Objectives
OdiaGenAI focuses on:
- **Foundation Models for Odia and Indic Languages**
- **Instruction-tuned and Task-specific LLMs for Indic Use Cases**
- **Speech and OCR Technologies for Odia and Indic Languages**
- **Multimodal AI (Text + Vision + Speech) for Low-resource Languages**
- **Open Data Creation, Benchmarks, and Evaluation Frameworks**
All outputs are released for **research and non-commercial use**.
---
## Why OdiaGenAI?
* **Low-resource challenge** — Odia support in existing LLMs is limited due to scarce training data.
* **Openness** — Proprietary models restrict access; we provide free, open models and datasets.
* **Ethics & privacy** — Transparent data practices and community ownership of language tech.
---
## Focus Research Areas
### 1. Literature & Benchmarking
Survey and evaluate generative AI and multimodal models for Odia.
### 2. Development
Curate datasets; build tokenizers, models, and training pipelines.
### 3. Deployment & Access
Host models and tools via **Hugging Face**, along with APIs and demos.
---
## Who Can Use OdiaGenAI?
* Researchers, students, developers, and NGOs.
Models and datasets are available via **Hugging Face for research and non-commercial purposes**. Contact us for special use cases.
---
## Key Application Areas
* Education
* Healthcare
* Governance
## Contributors
* [Shantipriya Parida (Founder)](https://www.linkedin.com/in/shantipriya-parida-9781a9127/)
* [Sambit Sekhar (Founder)](https://www.linkedin.com/in/sambit-sekhar-ai/)
* [Swateek Jena](https://www.linkedin.com/in/swateek/)
* [Abhijeet Parida](https://www.linkedin.com/in/a-parida/)
* [Dr. Satya Ranjan Dash](https://ksca.kiit.ac.in/profiles/satya-ranjan-dash/)
*About our logo:* The critically endangered [Olive Ridley](https://roundglasssustain.com/photostories/olive-ridley-turtles-endangered) sea turtle is the world's smallest and most prevalent marine turtle. Travel thousands of kilometers in the ocean for nesting. The Gahirmatha Marine Sanctuary in [Odisha](https://en.wikipedia.org/wiki/Odisha) is the largest known mass nesting rookery for olive ridley sea turtles worldwide.
## Citation
If you find this repository useful, please consider giving 👏 and citing:
```
@misc{OdiaGenAI,
author = {Shantipriya Parida and Sambit Sekhar and Swateek Jena and Abhijeet Parida and Satya Ranjan Dash},
title = {OdiaGenAI: Generative AI and LLM Initiative for the Odia Language},
year = {2023},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/OdiaGenAI}},
}
```
## License
This work is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png
[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg