Update README.md
Browse files
README.md
CHANGED
|
@@ -14,23 +14,32 @@ tags:
|
|
| 14 |
- meghalaya
|
| 15 |
- arunachal-pradesh
|
| 16 |
- sikkim
|
| 17 |
-
- neodac
|
| 18 |
language:
|
| 19 |
- en
|
| 20 |
pipeline_tag: text-generation
|
| 21 |
library_name: transformers
|
| 22 |
widget:
|
| 23 |
-
- example_title:
|
| 24 |
-
text:
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
---
|
| 30 |
|
| 31 |
-
# Neodac: Northeast India Cultural AI Model
|
| 32 |
|
| 33 |
-
**Neodac** (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac provides authentic, detailed responses about the rich cultural heritage of the region.
|
| 34 |
|
| 35 |
## π― Model Overview
|
| 36 |
|
|
@@ -64,15 +73,15 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
| 64 |
import torch
|
| 65 |
|
| 66 |
# Load model and tokenizer
|
| 67 |
-
tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac")
|
| 68 |
model = AutoModelForCausalLM.from_pretrained(
|
| 69 |
-
"MWirelabs/neodac",
|
| 70 |
torch_dtype=torch.bfloat16,
|
| 71 |
device_map="auto"
|
| 72 |
)
|
| 73 |
|
| 74 |
# Example usage
|
| 75 |
-
def ask_neodac(question):
|
| 76 |
prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
|
| 77 |
inputs = tokenizer(prompt, return_tensors="pt")
|
| 78 |
|
|
@@ -89,7 +98,7 @@ def ask_neodac(question):
|
|
| 89 |
return response.split("<start_of_turn>model\n")[-1].strip()
|
| 90 |
|
| 91 |
# Ask about Northeast India culture
|
| 92 |
-
response = ask_neodac("What is the significance of bamboo in Northeast India?")
|
| 93 |
print(response)
|
| 94 |
```
|
| 95 |
|
|
@@ -108,10 +117,9 @@ print(response)
|
|
| 108 |
- **Batch Size**: 8 per device
|
| 109 |
- **Precision**: bfloat16
|
| 110 |
- **Max Sequence Length**: 512 tokens
|
| 111 |
-
- **Training Time**: ~17 minutes
|
| 112 |
|
| 113 |
### Improvements Over Base Model
|
| 114 |
-
| Aspect | Base Gemma 3 1B-IT | Neodac |
|
| 115 |
|--------|-------------------|---------|
|
| 116 |
| Cultural Accuracy | β Hallucinations | β
Factually correct |
|
| 117 |
| Response Detail | β οΈ Generic/brief | β
Rich & comprehensive |
|
|
@@ -125,7 +133,7 @@ print(response)
|
|
| 125 |
**Base Model Response:**
|
| 126 |
> Claims Bihu is about Lord Shiva (incorrect)
|
| 127 |
|
| 128 |
-
**Neodac Response:**
|
| 129 |
> Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter.
|
| 130 |
|
| 131 |
## π― Use Cases
|
|
@@ -158,22 +166,22 @@ The model was evaluated on cultural accuracy, response completeness, and factual
|
|
| 158 |
|
| 159 |
## π Citation
|
| 160 |
|
| 161 |
-
If you use Neodac in your research or applications, please cite:
|
| 162 |
|
| 163 |
```bibtex
|
| 164 |
@misc{neodac2025,
|
| 165 |
-
title={Neodac: A Specialized Language Model for Northeast India Cultural Knowledge},
|
| 166 |
author={MWire Labs},
|
| 167 |
year={2025},
|
| 168 |
publisher={Hugging Face},
|
| 169 |
-
url={https://huggingface.co/MWirelabs/neodac},
|
| 170 |
note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education}
|
| 171 |
}
|
| 172 |
```
|
| 173 |
|
| 174 |
## π€ Contributing
|
| 175 |
|
| 176 |
-
Interested in improving Neodac? We welcome:
|
| 177 |
- Additional cultural data from Northeast India
|
| 178 |
- Feedback on cultural accuracy
|
| 179 |
- Suggestions for new cultural domains
|
|
@@ -192,4 +200,4 @@ This model is released under the Apache 2.0 license, same as the base Gemma mode
|
|
| 192 |
|
| 193 |
---
|
| 194 |
|
| 195 |
-
*Neodac represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.*
|
|
|
|
| 14 |
- meghalaya
|
| 15 |
- arunachal-pradesh
|
| 16 |
- sikkim
|
| 17 |
+
- neodac-mini
|
| 18 |
language:
|
| 19 |
- en
|
| 20 |
pipeline_tag: text-generation
|
| 21 |
library_name: transformers
|
| 22 |
widget:
|
| 23 |
+
- example_title: Bihu Festival
|
| 24 |
+
text: |
|
| 25 |
+
<start_of_turn>user
|
| 26 |
+
What is Bihu festival?<end_of_turn>
|
| 27 |
+
<start_of_turn>model
|
| 28 |
+
- example_title: Hornbill Festival
|
| 29 |
+
text: |
|
| 30 |
+
<start_of_turn>user
|
| 31 |
+
Tell me about Hornbill Festival.<end_of_turn>
|
| 32 |
+
<start_of_turn>model
|
| 33 |
+
- example_title: Assamese Cuisine
|
| 34 |
+
text: |
|
| 35 |
+
<start_of_turn>user
|
| 36 |
+
What is traditional Assamese cuisine?<end_of_turn>
|
| 37 |
+
<start_of_turn>model
|
| 38 |
---
|
| 39 |
|
| 40 |
+
# Neodac-mini: Northeast India Cultural AI Model
|
| 41 |
|
| 42 |
+
**Neodac-mini** (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac-mini provides authentic, detailed responses about the rich cultural heritage of the region.
|
| 43 |
|
| 44 |
## π― Model Overview
|
| 45 |
|
|
|
|
| 73 |
import torch
|
| 74 |
|
| 75 |
# Load model and tokenizer
|
| 76 |
+
tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac-mini")
|
| 77 |
model = AutoModelForCausalLM.from_pretrained(
|
| 78 |
+
"MWirelabs/neodac-mini",
|
| 79 |
torch_dtype=torch.bfloat16,
|
| 80 |
device_map="auto"
|
| 81 |
)
|
| 82 |
|
| 83 |
# Example usage
|
| 84 |
+
def ask_neodac-mini(question):
|
| 85 |
prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
|
| 86 |
inputs = tokenizer(prompt, return_tensors="pt")
|
| 87 |
|
|
|
|
| 98 |
return response.split("<start_of_turn>model\n")[-1].strip()
|
| 99 |
|
| 100 |
# Ask about Northeast India culture
|
| 101 |
+
response = ask_neodac-mini("What is the significance of bamboo in Northeast India?")
|
| 102 |
print(response)
|
| 103 |
```
|
| 104 |
|
|
|
|
| 117 |
- **Batch Size**: 8 per device
|
| 118 |
- **Precision**: bfloat16
|
| 119 |
- **Max Sequence Length**: 512 tokens
|
|
|
|
| 120 |
|
| 121 |
### Improvements Over Base Model
|
| 122 |
+
| Aspect | Base Gemma 3 1B-IT | Neodac-mini |
|
| 123 |
|--------|-------------------|---------|
|
| 124 |
| Cultural Accuracy | β Hallucinations | β
Factually correct |
|
| 125 |
| Response Detail | β οΈ Generic/brief | β
Rich & comprehensive |
|
|
|
|
| 133 |
**Base Model Response:**
|
| 134 |
> Claims Bihu is about Lord Shiva (incorrect)
|
| 135 |
|
| 136 |
+
**Neodac-mini Response:**
|
| 137 |
> Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter.
|
| 138 |
|
| 139 |
## π― Use Cases
|
|
|
|
| 166 |
|
| 167 |
## π Citation
|
| 168 |
|
| 169 |
+
If you use Neodac-mini in your research or applications, please cite:
|
| 170 |
|
| 171 |
```bibtex
|
| 172 |
@misc{neodac2025,
|
| 173 |
+
title={Neodac-mini: A Specialized Language Model for Northeast India Cultural Knowledge},
|
| 174 |
author={MWire Labs},
|
| 175 |
year={2025},
|
| 176 |
publisher={Hugging Face},
|
| 177 |
+
url={https://huggingface.co/MWirelabs/neodac-mini},
|
| 178 |
note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education}
|
| 179 |
}
|
| 180 |
```
|
| 181 |
|
| 182 |
## π€ Contributing
|
| 183 |
|
| 184 |
+
Interested in improving Neodac-mini? We welcome:
|
| 185 |
- Additional cultural data from Northeast India
|
| 186 |
- Feedback on cultural accuracy
|
| 187 |
- Suggestions for new cultural domains
|
|
|
|
| 200 |
|
| 201 |
---
|
| 202 |
|
| 203 |
+
*Neodac-mini represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.*
|