Badnyal commited on
Commit
6b083e4
Β·
verified Β·
1 Parent(s): f521fbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -21
README.md CHANGED
@@ -14,23 +14,32 @@ tags:
14
  - meghalaya
15
  - arunachal-pradesh
16
  - sikkim
17
- - neodac
18
  language:
19
  - en
20
  pipeline_tag: text-generation
21
  library_name: transformers
22
  widget:
23
- - example_title: "Bihu Festival"
24
- text: "<start_of_turn>user\nWhat is Bihu festival?<end_of_turn>\n<start_of_turn>model\n"
25
- - example_title: "Hornbill Festival"
26
- text: "<start_of_turn>user\nTell me about Hornbill Festival.<end_of_turn>\n<start_of_turn>model\n"
27
- - example_title: "Assamese Cuisine"
28
- text: "<start_of_turn>user\nWhat is traditional Assamese cuisine?<end_of_turn>\n<start_of_turn>model\n"
 
 
 
 
 
 
 
 
 
29
  ---
30
 
31
- # Neodac: Northeast India Cultural AI Model
32
 
33
- **Neodac** (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac provides authentic, detailed responses about the rich cultural heritage of the region.
34
 
35
  ## 🎯 Model Overview
36
 
@@ -64,15 +73,15 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
64
  import torch
65
 
66
  # Load model and tokenizer
67
- tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac")
68
  model = AutoModelForCausalLM.from_pretrained(
69
- "MWirelabs/neodac",
70
  torch_dtype=torch.bfloat16,
71
  device_map="auto"
72
  )
73
 
74
  # Example usage
75
- def ask_neodac(question):
76
  prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
77
  inputs = tokenizer(prompt, return_tensors="pt")
78
 
@@ -89,7 +98,7 @@ def ask_neodac(question):
89
  return response.split("<start_of_turn>model\n")[-1].strip()
90
 
91
  # Ask about Northeast India culture
92
- response = ask_neodac("What is the significance of bamboo in Northeast India?")
93
  print(response)
94
  ```
95
 
@@ -108,10 +117,9 @@ print(response)
108
  - **Batch Size**: 8 per device
109
  - **Precision**: bfloat16
110
  - **Max Sequence Length**: 512 tokens
111
- - **Training Time**: ~17 minutes
112
 
113
  ### Improvements Over Base Model
114
- | Aspect | Base Gemma 3 1B-IT | Neodac |
115
  |--------|-------------------|---------|
116
  | Cultural Accuracy | ❌ Hallucinations | βœ… Factually correct |
117
  | Response Detail | ⚠️ Generic/brief | βœ… Rich & comprehensive |
@@ -125,7 +133,7 @@ print(response)
125
  **Base Model Response:**
126
  > Claims Bihu is about Lord Shiva (incorrect)
127
 
128
- **Neodac Response:**
129
  > Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter.
130
 
131
  ## 🎯 Use Cases
@@ -158,22 +166,22 @@ The model was evaluated on cultural accuracy, response completeness, and factual
158
 
159
  ## πŸ“œ Citation
160
 
161
- If you use Neodac in your research or applications, please cite:
162
 
163
  ```bibtex
164
  @misc{neodac2025,
165
- title={Neodac: A Specialized Language Model for Northeast India Cultural Knowledge},
166
  author={MWire Labs},
167
  year={2025},
168
  publisher={Hugging Face},
169
- url={https://huggingface.co/MWirelabs/neodac},
170
  note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education}
171
  }
172
  ```
173
 
174
  ## 🀝 Contributing
175
 
176
- Interested in improving Neodac? We welcome:
177
  - Additional cultural data from Northeast India
178
  - Feedback on cultural accuracy
179
  - Suggestions for new cultural domains
@@ -192,4 +200,4 @@ This model is released under the Apache 2.0 license, same as the base Gemma mode
192
 
193
  ---
194
 
195
- *Neodac represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.*
 
14
  - meghalaya
15
  - arunachal-pradesh
16
  - sikkim
17
+ - neodac-mini
18
  language:
19
  - en
20
  pipeline_tag: text-generation
21
  library_name: transformers
22
  widget:
23
+ - example_title: Bihu Festival
24
+ text: |
25
+ <start_of_turn>user
26
+ What is Bihu festival?<end_of_turn>
27
+ <start_of_turn>model
28
+ - example_title: Hornbill Festival
29
+ text: |
30
+ <start_of_turn>user
31
+ Tell me about Hornbill Festival.<end_of_turn>
32
+ <start_of_turn>model
33
+ - example_title: Assamese Cuisine
34
+ text: |
35
+ <start_of_turn>user
36
+ What is traditional Assamese cuisine?<end_of_turn>
37
+ <start_of_turn>model
38
  ---
39
 
40
+ # Neodac-mini: Northeast India Cultural AI Model
41
 
42
+ **Neodac-mini** (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac-mini provides authentic, detailed responses about the rich cultural heritage of the region.
43
 
44
  ## 🎯 Model Overview
45
 
 
73
  import torch
74
 
75
  # Load model and tokenizer
76
+ tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac-mini")
77
  model = AutoModelForCausalLM.from_pretrained(
78
+ "MWirelabs/neodac-mini",
79
  torch_dtype=torch.bfloat16,
80
  device_map="auto"
81
  )
82
 
83
  # Example usage
84
+ def ask_neodac-mini(question):
85
  prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
86
  inputs = tokenizer(prompt, return_tensors="pt")
87
 
 
98
  return response.split("<start_of_turn>model\n")[-1].strip()
99
 
100
  # Ask about Northeast India culture
101
+ response = ask_neodac-mini("What is the significance of bamboo in Northeast India?")
102
  print(response)
103
  ```
104
 
 
117
  - **Batch Size**: 8 per device
118
  - **Precision**: bfloat16
119
  - **Max Sequence Length**: 512 tokens
 
120
 
121
  ### Improvements Over Base Model
122
+ | Aspect | Base Gemma 3 1B-IT | Neodac-mini |
123
  |--------|-------------------|---------|
124
  | Cultural Accuracy | ❌ Hallucinations | βœ… Factually correct |
125
  | Response Detail | ⚠️ Generic/brief | βœ… Rich & comprehensive |
 
133
  **Base Model Response:**
134
  > Claims Bihu is about Lord Shiva (incorrect)
135
 
136
+ **Neodac-mini Response:**
137
  > Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter.
138
 
139
  ## 🎯 Use Cases
 
166
 
167
  ## πŸ“œ Citation
168
 
169
+ If you use Neodac-mini in your research or applications, please cite:
170
 
171
  ```bibtex
172
  @misc{neodac2025,
173
+ title={Neodac-mini: A Specialized Language Model for Northeast India Cultural Knowledge},
174
  author={MWire Labs},
175
  year={2025},
176
  publisher={Hugging Face},
177
+ url={https://huggingface.co/MWirelabs/neodac-mini},
178
  note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education}
179
  }
180
  ```
181
 
182
  ## 🀝 Contributing
183
 
184
+ Interested in improving Neodac-mini? We welcome:
185
  - Additional cultural data from Northeast India
186
  - Feedback on cultural accuracy
187
  - Suggestions for new cultural domains
 
200
 
201
  ---
202
 
203
+ *Neodac-mini represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.*