|
|
--- |
|
|
title: README |
|
|
emoji: 🏢 |
|
|
colorFrom: green |
|
|
colorTo: pink |
|
|
sdk: static |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
## About |
|
|
OdiaGenAI is an open research initiative advancing Generative AI, LLMs, and multimodal technologies for Odia and low-resource Indic languages through community-driven, open-source collaboration. |
|
|
|
|
|
--- |
|
|
|
|
|
## Vision |
|
|
Empowering Odia and low-resource Indic languages through open, multimodal, and community-owned AI. |
|
|
|
|
|
--- |
|
|
|
|
|
## Related Hugging Face Organizations |
|
|
|
|
|
OdiaGenAI collaborates with and maintains close ties to other HF organizations that focus on Odia and Indic LLMs: |
|
|
|
|
|
* **🔗 [OdiaGenAI](https://huggingface.co/OdiaGenAI)** – Main organization for Odia datasets, models, and AI tools (text, speech, OCR, multimodal). |
|
|
https://huggingface.co/OdiaGenAI |
|
|
|
|
|
* **🔗 [OdiaGenAI‑LLM](https://huggingface.co/OdiaGenAI‑LLM)** – Focused LLM organization with additional Odia and Indic‑centric model releases (e.g., Mistral, LLaMA variants). |
|
|
https://huggingface.co/OdiaGenAI‑LLM |
|
|
|
|
|
* **🔗 [odiagenmllm](https://huggingface.co/odiagenmllm)** – Organization hosting multilingual and Odia‑focused LLM projects, benchmarks, and community models. |
|
|
https://huggingface.co/odiagenmllm |
|
|
|
|
|
* **🔗 [OdiaGenAIdata](https://huggingface.co/OdiaGenAIdata)** – Dataset‑centric organization hosting large corpora for Odia pretraining and evaluation (if separate). |
|
|
https://huggingface.co/OdiaGenAIdata |
|
|
|
|
|
* **🔗 [OdiaGenAIOCR](https://huggingface.co/OdiaGenAIOCR)** – Organization dedicated to **Odia OCR datasets, models, and tools** for printed and handwritten text recognition. |
|
|
https://huggingface.co/OdiaGenAIOCR |
|
|
|
|
|
* **🔗 [Hindi‑data‑hub](https://huggingface.co/Hindi-data-hub)** – A community‑driven hub for **Hindi language datasets and models**, supporting Indic language research. |
|
|
https://huggingface.co/Hindi-data-hub |
|
|
|
|
|
* **🔗 [HydraIndicLM](https://huggingface.co/HydraIndicLM)** – An Indic LLM initiative focused on building and hosting language models and benchmarks for multiple Indic languages. |
|
|
https://huggingface.co/HydraIndicLM |
|
|
|
|
|
* **🔗 [ShopIntel](https://huggingface.co/ShopIntel)** – Organization oriented toward **multilingual models and industry‑focused AI research**, including support for Indic languages. |
|
|
https://huggingface.co/ShopIntel |
|
|
|
|
|
* **🔗 [Indic‑Benchmark](https://huggingface.co/Indic-Benchmark)** – Initiative providing **benchmarks and evaluation suites** for multiple Indic languages across NLP tasks. |
|
|
https://huggingface.co/Indic-Benchmark |
|
|
|
|
|
|
|
|
## Objectives |
|
|
OdiaGenAI focuses on: |
|
|
|
|
|
- **Foundation Models for Odia and Indic Languages** |
|
|
- **Instruction-tuned and Task-specific LLMs for Indic Use Cases** |
|
|
- **Speech and OCR Technologies for Odia and Indic Languages** |
|
|
- **Multimodal AI (Text + Vision + Speech) for Low-resource Languages** |
|
|
- **Open Data Creation, Benchmarks, and Evaluation Frameworks** |
|
|
|
|
|
|
|
|
All outputs are released for **research and non-commercial use**. |
|
|
|
|
|
--- |
|
|
|
|
|
## Why OdiaGenAI? |
|
|
|
|
|
* **Low-resource challenge** — Odia support in existing LLMs is limited due to scarce training data. |
|
|
* **Openness** — Proprietary models restrict access; we provide free, open models and datasets. |
|
|
* **Ethics & privacy** — Transparent data practices and community ownership of language tech. |
|
|
|
|
|
--- |
|
|
|
|
|
## Focus Research Areas |
|
|
|
|
|
### 1. Literature & Benchmarking |
|
|
Survey and evaluate generative AI and multimodal models for Odia. |
|
|
|
|
|
### 2. Development |
|
|
Curate datasets; build tokenizers, models, and training pipelines. |
|
|
|
|
|
### 3. Deployment & Access |
|
|
Host models and tools via **Hugging Face**, along with APIs and demos. |
|
|
|
|
|
--- |
|
|
|
|
|
## Who Can Use OdiaGenAI? |
|
|
* Researchers, students, developers, and NGOs. |
|
|
Models and datasets are available via **Hugging Face for research and non-commercial purposes**. Contact us for special use cases. |
|
|
|
|
|
--- |
|
|
|
|
|
## Key Application Areas |
|
|
* Education |
|
|
* Healthcare |
|
|
* Governance |
|
|
|
|
|
|
|
|
|
|
|
## Contributors |
|
|
* [Shantipriya Parida (Founder)](https://www.linkedin.com/in/shantipriya-parida-9781a9127/) |
|
|
* [Sambit Sekhar (Founder)](https://www.linkedin.com/in/sambit-sekhar-ai/) |
|
|
* [Swateek Jena](https://www.linkedin.com/in/swateek/) |
|
|
* [Abhijeet Parida](https://www.linkedin.com/in/a-parida/) |
|
|
* [Dr. Satya Ranjan Dash](https://ksca.kiit.ac.in/profiles/satya-ranjan-dash/) |
|
|
|
|
|
|
|
|
*About our logo:* The critically endangered [Olive Ridley](https://roundglasssustain.com/photostories/olive-ridley-turtles-endangered) sea turtle is the world's smallest and most prevalent marine turtle. Travel thousands of kilometers in the ocean for nesting. The Gahirmatha Marine Sanctuary in [Odisha](https://en.wikipedia.org/wiki/Odisha) is the largest known mass nesting rookery for olive ridley sea turtles worldwide. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find this repository useful, please consider giving 👏 and citing: |
|
|
|
|
|
``` |
|
|
@misc{OdiaGenAI, |
|
|
author = {Shantipriya Parida and Sambit Sekhar and Swateek Jena and Abhijeet Parida and Satya Ranjan Dash}, |
|
|
title = {OdiaGenAI: Generative AI and LLM Initiative for the Odia Language}, |
|
|
year = {2023}, |
|
|
publisher = {Hugging Face}, |
|
|
journal = {Hugging Face repository}, |
|
|
howpublished = {\url{https://huggingface.co/OdiaGenAI}}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This work is licensed under a |
|
|
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. |
|
|
|
|
|
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] |
|
|
|
|
|
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ |
|
|
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png |
|
|
[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg |