shantipriya commited on
Commit
281448e
Β·
verified Β·
1 Parent(s): ce19c71

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -27
README.md CHANGED
@@ -8,52 +8,78 @@ pinned: false
8
  ---
9
 
10
  ## About
11
- The Odia Generative AI (in short, OdiaGenAI) is an initiative to research Generative AI and Large Language Models (LLMs) for the low-resource Odia language.
12
 
13
- ## Objective
14
- The OdiaGenAI aims to
15
- 1. Build pre-trained Odia LLM,
16
- 2. Fine-tuned Odia LLM, and
17
- 3. Instruct LLM (Odia).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- The data, code, and models will be available to the public for research and non-commercial purposes.
20
 
21
- ## Why OdiaGenAI
22
 
23
- * **First**: Though many LLMs support multilingual, including Odia language, the performance for various tasks (e.g., content generation, question-answering) is limited due to the amount of ingested data for Odia.
24
- * **Second**: There are subscriptions or fees associated with the high-performing LLMs.
 
 
 
 
 
 
 
25
 
26
- * **Third**: The usage (privacy) and bias of data input to these LLMs are in question.
27
 
28
- ## What are the focus research areas of OdiaGenAI
29
- We have divided the primary focus areas into three parts.
30
 
31
- **1. Literature Survey:** Investigate the latest developments in Generative AI and LLMs and analyze current methods to support the Odia language for different tasks.
 
32
 
33
- **2. Development:** Developing pre-trained and fine-tuned Odia LLM, which includes dataset preparation, model training, evaluation, prompt engineering, and API development.
 
34
 
35
- **3. Deployment:** Deploy the Odia LLM models for public access for research and non-commercial purposes.
 
 
 
 
36
 
37
- ## Who can use OdiaGenAI LLMs
38
- The models (pre-trained/fine-tuned) will be available through Hugging Face for research and non-commercial purposes. Feel free to contact us for a domain-specific application or particular use cases.
39
 
40
- ## What are the use cases of OdiaGenAI LLMs
41
- There are several use cases of OdiaGenAI LLMs. Three primary domains relating to Odisha which we are focusing to use the developed LLM are:
42
  * Education
43
  * Healthcare
44
- * Governance
45
- * Tourism
46
- * Agriculture
47
- * Industrial Application
48
 
49
  ## Contributors
50
  * [Shantipriya Parida (Founder)](https://www.linkedin.com/in/shantipriya-parida-9781a9127/)
51
  * [Sambit Sekhar (Founder)](https://www.linkedin.com/in/sambit-sekhar-ai/)
52
- * [Soumendra Kumar Sahoo](https://www.linkedin.com/in/soumendrak/)
53
  * [Swateek Jena](https://www.linkedin.com/in/swateek/)
54
  * [Abhijeet Parida](https://www.linkedin.com/in/a-parida/)
55
  * [Dr. Satya Ranjan Dash](https://ksca.kiit.ac.in/profiles/satya-ranjan-dash/)
56
- * [Guneet Singh Kohli](https://www.linkedin.com/in/guneetsk99/)
57
 
58
  *About our logo:* The critically endangered [Olive Ridley](https://roundglasssustain.com/photostories/olive-ridley-turtles-endangered) sea turtle is the world's smallest and most prevalent marine turtle. Travel thousands of kilometers in the ocean for nesting. The Gahirmatha Marine Sanctuary in [Odisha](https://en.wikipedia.org/wiki/Odisha) is the largest known mass nesting rookery for olive ridley sea turtles worldwide.
59
 
@@ -63,7 +89,7 @@ If you find this repository useful, please consider giving πŸ‘ and citing:
63
 
64
  ```
65
  @misc{OdiaGenAI,
66
- author = {Shantipriya Parida and Sambit Sekhar and Soumendra Kumar Sahoo and Swateek Jena and Abhijeet Parida and Satya Ranjan Dash and Guneet Singh Kohli},
67
  title = {OdiaGenAI: Generative AI and LLM Initiative for the Odia Language},
68
  year = {2023},
69
  publisher = {Hugging Face},
 
8
  ---
9
 
10
  ## About
11
+ OdiaGenAI is an open research initiative advancing Generative AI, LLMs, and multimodal technologies for Odia and low-resource Indic languages through community-driven, open-source collaboration.
12
 
13
+ ---
14
+
15
+ ## Vision
16
+ Empowering Odia and low-resource Indic languages through open, multimodal, and community-owned AI.
17
+
18
+ ---
19
+
20
+ ## Related Hugging Face Organizations
21
+
22
+ OdiaGenAI collaborates with and maintains close ties to other HF organizations that focus on Odia and Indic LLMs:
23
+
24
+ * **πŸ”— [OdiaGenAI](https://huggingface.co/OdiaGenAI)** – Main org for Odia datasets, models, and AI tools (text, speech, OCR, multimodal). :contentReference[oaicite:2]{index=2}
25
+ * **πŸ”— [OdiaGenAI-LLM](https://huggingface.co/OdiaGenAI-LLM)** – Focused LLM org with additional Odia and Indic-centric model releases (Mistral, LLaMA, instruction sets). :contentReference[oaicite:3]{index=3}
26
+ * **πŸ”— [OdiaGenAIdata](https://huggingface.co/OdiaGenAIdata)** – Dataset-centric org hosting large corpora for Odia pretraining and evaluation (if separate).
27
+
28
+
29
+ ## Objectives
30
+ OdiaGenAI focuses on:
31
+
32
+ - **Foundation Models for Odia and Indic Languages**
33
+ - **Instruction-tuned and Task-specific LLMs for Indic Use Cases**
34
+ - **Speech and OCR Technologies for Odia and Indic Languages**
35
+ - **Multimodal AI (Text + Vision + Speech) for Low-resource Languages**
36
+ - **Open Data Creation, Benchmarks, and Evaluation Frameworks**
37
 
 
38
 
39
+ All outputs are released for **research and non-commercial use**.
40
 
41
+ ---
42
+
43
+ ## Why OdiaGenAI?
44
+
45
+ * **Low-resource challenge** β€” Odia support in existing LLMs is limited due to scarce training data.
46
+ * **Openness** β€” Proprietary models restrict access; we provide free, open models and datasets.
47
+ * **Ethics & privacy** β€” Transparent data practices and community ownership of language tech.
48
+
49
+ ---
50
 
51
+ ## Focus Research Areas
52
 
53
+ ### 1. Literature & Benchmarking
54
+ Survey and evaluate generative AI and multimodal models for Odia.
55
 
56
+ ### 2. Development
57
+ Curate datasets; build tokenizers, models, and training pipelines.
58
 
59
+ ### 3. Deployment & Access
60
+ Host models and tools via **Hugging Face**, along with APIs and demos.
61
 
62
+ ---
63
+
64
+ ## Who Can Use OdiaGenAI?
65
+ * Researchers, students, developers, and NGOs.
66
+ Models and datasets are available via **Hugging Face for research and non-commercial purposes**. Contact us for special use cases.
67
 
68
+ ---
 
69
 
70
+ ## Key Application Areas
 
71
  * Education
72
  * Healthcare
73
+ * Governance*
74
+
 
 
75
 
76
  ## Contributors
77
  * [Shantipriya Parida (Founder)](https://www.linkedin.com/in/shantipriya-parida-9781a9127/)
78
  * [Sambit Sekhar (Founder)](https://www.linkedin.com/in/sambit-sekhar-ai/)
 
79
  * [Swateek Jena](https://www.linkedin.com/in/swateek/)
80
  * [Abhijeet Parida](https://www.linkedin.com/in/a-parida/)
81
  * [Dr. Satya Ranjan Dash](https://ksca.kiit.ac.in/profiles/satya-ranjan-dash/)
82
+
83
 
84
  *About our logo:* The critically endangered [Olive Ridley](https://roundglasssustain.com/photostories/olive-ridley-turtles-endangered) sea turtle is the world's smallest and most prevalent marine turtle. Travel thousands of kilometers in the ocean for nesting. The Gahirmatha Marine Sanctuary in [Odisha](https://en.wikipedia.org/wiki/Odisha) is the largest known mass nesting rookery for olive ridley sea turtles worldwide.
85
 
 
89
 
90
  ```
91
  @misc{OdiaGenAI,
92
+ author = {Shantipriya Parida and Sambit Sekhar and Swateek Jena and Abhijeet Parida and Satya Ranjan Dash},
93
  title = {OdiaGenAI: Generative AI and LLM Initiative for the Odia Language},
94
  year = {2023},
95
  publisher = {Hugging Face},