--- title: README emoji: 🏢 colorFrom: green colorTo: pink sdk: static pinned: false --- ## About OdiaGenAI is an open research initiative advancing Generative AI, LLMs, and multimodal technologies for Odia and low-resource Indic languages through community-driven, open-source collaboration. --- ## Vision Empowering Odia and low-resource Indic languages through open, multimodal, and community-owned AI. --- ## Related Hugging Face Organizations OdiaGenAI collaborates with and maintains close ties to other HF organizations that focus on Odia and Indic LLMs: * **🔗 [OdiaGenAI](https://huggingface.co/OdiaGenAI)** – Main organization for Odia datasets, models, and AI tools (text, speech, OCR, multimodal). https://huggingface.co/OdiaGenAI * **🔗 [OdiaGenAI‑LLM](https://huggingface.co/OdiaGenAI‑LLM)** – Focused LLM organization with additional Odia and Indic‑centric model releases (e.g., Mistral, LLaMA variants). https://huggingface.co/OdiaGenAI‑LLM * **🔗 [odiagenmllm](https://huggingface.co/odiagenmllm)** – Organization hosting multilingual and Odia‑focused LLM projects, benchmarks, and community models. https://huggingface.co/odiagenmllm * **🔗 [OdiaGenAIdata](https://huggingface.co/OdiaGenAIdata)** – Dataset‑centric organization hosting large corpora for Odia pretraining and evaluation (if separate). https://huggingface.co/OdiaGenAIdata * **🔗 [OdiaGenAIOCR](https://huggingface.co/OdiaGenAIOCR)** – Organization dedicated to **Odia OCR datasets, models, and tools** for printed and handwritten text recognition. https://huggingface.co/OdiaGenAIOCR * **🔗 [Hindi‑data‑hub](https://huggingface.co/Hindi-data-hub)** – A community‑driven hub for **Hindi language datasets and models**, supporting Indic language research. https://huggingface.co/Hindi-data-hub * **🔗 [HydraIndicLM](https://huggingface.co/HydraIndicLM)** – An Indic LLM initiative focused on building and hosting language models and benchmarks for multiple Indic languages. https://huggingface.co/HydraIndicLM * **🔗 [ShopIntel](https://huggingface.co/ShopIntel)** – Organization oriented toward **multilingual models and industry‑focused AI research**, including support for Indic languages. https://huggingface.co/ShopIntel * **🔗 [Indic‑Benchmark](https://huggingface.co/Indic-Benchmark)** – Initiative providing **benchmarks and evaluation suites** for multiple Indic languages across NLP tasks. https://huggingface.co/Indic-Benchmark ## Objectives OdiaGenAI focuses on: - **Foundation Models for Odia and Indic Languages** - **Instruction-tuned and Task-specific LLMs for Indic Use Cases** - **Speech and OCR Technologies for Odia and Indic Languages** - **Multimodal AI (Text + Vision + Speech) for Low-resource Languages** - **Open Data Creation, Benchmarks, and Evaluation Frameworks** All outputs are released for **research and non-commercial use**. --- ## Why OdiaGenAI? * **Low-resource challenge** — Odia support in existing LLMs is limited due to scarce training data. * **Openness** — Proprietary models restrict access; we provide free, open models and datasets. * **Ethics & privacy** — Transparent data practices and community ownership of language tech. --- ## Focus Research Areas ### 1. Literature & Benchmarking Survey and evaluate generative AI and multimodal models for Odia. ### 2. Development Curate datasets; build tokenizers, models, and training pipelines. ### 3. Deployment & Access Host models and tools via **Hugging Face**, along with APIs and demos. --- ## Who Can Use OdiaGenAI? * Researchers, students, developers, and NGOs. Models and datasets are available via **Hugging Face for research and non-commercial purposes**. Contact us for special use cases. --- ## Key Application Areas * Education * Healthcare * Governance ## Contributors * [Shantipriya Parida (Founder)](https://www.linkedin.com/in/shantipriya-parida-9781a9127/) * [Sambit Sekhar (Founder)](https://www.linkedin.com/in/sambit-sekhar-ai/) * [Swateek Jena](https://www.linkedin.com/in/swateek/) * [Abhijeet Parida](https://www.linkedin.com/in/a-parida/) * [Dr. Satya Ranjan Dash](https://ksca.kiit.ac.in/profiles/satya-ranjan-dash/) *About our logo:* The critically endangered [Olive Ridley](https://roundglasssustain.com/photostories/olive-ridley-turtles-endangered) sea turtle is the world's smallest and most prevalent marine turtle. Travel thousands of kilometers in the ocean for nesting. The Gahirmatha Marine Sanctuary in [Odisha](https://en.wikipedia.org/wiki/Odisha) is the largest known mass nesting rookery for olive ridley sea turtles worldwide. ## Citation If you find this repository useful, please consider giving 👏 and citing: ``` @misc{OdiaGenAI, author = {Shantipriya Parida and Sambit Sekhar and Swateek Jena and Abhijeet Parida and Satya Ranjan Dash}, title = {OdiaGenAI: Generative AI and LLM Initiative for the Odia Language}, year = {2023}, publisher = {Hugging Face}, journal = {Hugging Face repository}, howpublished = {\url{https://huggingface.co/OdiaGenAI}}, } ``` ## License This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png [cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg