File size: 5,566 Bytes
56d3381
 
 
 
 
 
 
 
 
8bc7f73
281448e
8bc7f73
281448e
 
 
 
 
 
 
 
 
 
 
53d4210
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
281448e
 
 
 
 
 
 
 
 
 
8bc7f73
 
281448e
8bc7f73
281448e
 
 
 
 
 
 
 
 
8bc7f73
281448e
8bc7f73
281448e
 
8bc7f73
281448e
 
8bc7f73
281448e
 
8bc7f73
281448e
 
 
 
 
8bc7f73
281448e
8bc7f73
281448e
8bc7f73
 
22345fc
 
281448e
8bc7f73
 
ce19c71
 
8bc7f73
 
 
281448e
2e8975b
 
4b70050
 
 
6b25eab
4b70050
 
 
281448e
4b70050
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
title: README
emoji: 🏢
colorFrom: green
colorTo: pink
sdk: static
pinned: false
---

## About
OdiaGenAI is an open research initiative advancing Generative AI, LLMs, and multimodal technologies for Odia and low-resource Indic languages through community-driven, open-source collaboration.

---

## Vision
Empowering Odia and low-resource Indic languages through open, multimodal, and community-owned AI.

---

## Related Hugging Face Organizations

OdiaGenAI collaborates with and maintains close ties to other HF organizations that focus on Odia and Indic LLMs:

* **🔗 [OdiaGenAI](https://huggingface.co/OdiaGenAI)** – Main organization for Odia datasets, models, and AI tools (text, speech, OCR, multimodal).  
  https://huggingface.co/OdiaGenAI

* **🔗 [OdiaGenAI‑LLM](https://huggingface.co/OdiaGenAI‑LLM)** – Focused LLM organization with additional Odia and Indic‑centric model releases (e.g., Mistral, LLaMA variants).  
  https://huggingface.co/OdiaGenAI‑LLM

* **🔗 [odiagenmllm](https://huggingface.co/odiagenmllm)** – Organization hosting multilingual and Odia‑focused LLM projects, benchmarks, and community models.  
  https://huggingface.co/odiagenmllm

* **🔗 [OdiaGenAIdata](https://huggingface.co/OdiaGenAIdata)** – Dataset‑centric organization hosting large corpora for Odia pretraining and evaluation (if separate).  
  https://huggingface.co/OdiaGenAIdata

* **🔗 [OdiaGenAIOCR](https://huggingface.co/OdiaGenAIOCR)** – Organization dedicated to **Odia OCR datasets, models, and tools** for printed and handwritten text recognition.  
  https://huggingface.co/OdiaGenAIOCR

* **🔗 [Hindi‑data‑hub](https://huggingface.co/Hindi-data-hub)** – A community‑driven hub for **Hindi language datasets and models**, supporting Indic language research.  
  https://huggingface.co/Hindi-data-hub

* **🔗 [HydraIndicLM](https://huggingface.co/HydraIndicLM)** – An Indic LLM initiative focused on building and hosting language models and benchmarks for multiple Indic languages.  
  https://huggingface.co/HydraIndicLM

* **🔗 [ShopIntel](https://huggingface.co/ShopIntel)** – Organization oriented toward **multilingual models and industry‑focused AI research**, including support for Indic languages.  
  https://huggingface.co/ShopIntel

* **🔗 [Indic‑Benchmark](https://huggingface.co/Indic-Benchmark)** – Initiative providing **benchmarks and evaluation suites** for multiple Indic languages across NLP tasks.  
  https://huggingface.co/Indic-Benchmark 


## Objectives
OdiaGenAI focuses on:

- **Foundation Models for Odia and Indic Languages**
- **Instruction-tuned and Task-specific LLMs for Indic Use Cases**
- **Speech and OCR Technologies for Odia and Indic Languages**
- **Multimodal AI (Text + Vision + Speech) for Low-resource Languages**
- **Open Data Creation, Benchmarks, and Evaluation Frameworks**


All outputs are released for **research and non-commercial use**.

---

## Why OdiaGenAI?

* **Low-resource challenge** — Odia support in existing LLMs is limited due to scarce training data.
* **Openness** — Proprietary models restrict access; we provide free, open models and datasets.
* **Ethics & privacy** — Transparent data practices and community ownership of language tech.

---

## Focus Research Areas

### 1. Literature & Benchmarking
Survey and evaluate generative AI and multimodal models for Odia.

### 2. Development
Curate datasets; build tokenizers, models, and training pipelines.

### 3. Deployment & Access
Host models and tools via **Hugging Face**, along with APIs and demos.

---

## Who Can Use OdiaGenAI?
* Researchers, students, developers, and NGOs.  
Models and datasets are available via **Hugging Face for research and non-commercial purposes**. Contact us for special use cases.

---

## Key Application Areas
* Education
* Healthcare
* Governance
  


## Contributors
* [Shantipriya Parida (Founder)](https://www.linkedin.com/in/shantipriya-parida-9781a9127/)
* [Sambit Sekhar (Founder)](https://www.linkedin.com/in/sambit-sekhar-ai/)
* [Swateek Jena](https://www.linkedin.com/in/swateek/)
* [Abhijeet Parida](https://www.linkedin.com/in/a-parida/)
* [Dr. Satya Ranjan Dash](https://ksca.kiit.ac.in/profiles/satya-ranjan-dash/)


  *About our logo:* The critically endangered [Olive Ridley](https://roundglasssustain.com/photostories/olive-ridley-turtles-endangered) sea turtle is the world's smallest and most prevalent marine turtle. Travel thousands of kilometers in the ocean for nesting. The Gahirmatha Marine Sanctuary in [Odisha](https://en.wikipedia.org/wiki/Odisha) is the largest known mass nesting rookery for olive ridley sea turtles worldwide.

## Citation

If you find this repository useful, please consider giving 👏 and citing:

```
@misc{OdiaGenAI,
  author = {Shantipriya Parida and Sambit Sekhar and Swateek Jena and Abhijeet Parida and Satya Ranjan Dash},
  title = {OdiaGenAI: Generative AI and LLM Initiative for the Odia Language},
  year = {2023},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/OdiaGenAI}},
}
```

## License

This work is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].

[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png
[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg