--- license: apache-2.0 base_model: google/gemma-3-1b-it tags: - gemma - northeast-india - cultural - fine-tuned - assam - manipur - nagaland - mizoram - tripura - meghalaya - arunachal-pradesh - sikkim - neodac-mini language: - en pipeline_tag: text-generation library_name: transformers widget: - example_title: Bihu Festival text: | user What is Bihu festival? model - example_title: Hornbill Festival text: | user Tell me about Hornbill Festival. model - example_title: Assamese Cuisine text: | user What is traditional Assamese cuisine? model --- # Neodac-mini: Northeast India Cultural AI Model **Neodac-mini** (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac-mini provides authentic, detailed responses about the rich cultural heritage of the region. ## 🎯 Model Overview - **Base Model**: [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it) - **Specialization**: Northeast India Cultural Knowledge - **Training Data**: 6,205 culturally authentic Q&A pairs - **Coverage**: All 8 Northeast Indian states - **Languages**: English (with cultural context) ## 🌟 Key Features ### Cultural Domains Covered - **Festivals & Celebrations**: Bihu, Hornbill, Losar, Chapchar Kut, etc. - **Traditional Arts**: Dance forms, music, crafts, weaving - **Cuisine**: Regional foods, cooking methods, traditional recipes - **Tribal Heritage**: Community practices, languages, customs - **Geography**: Cultural significance of places and landmarks - **Literature**: Folk tales, oral traditions, regional literature ### Model Capabilities - ✅ Accurate cultural information without hallucinations - ✅ Detailed responses about regional traditions - ✅ Authentic representation of tribal communities - ✅ Contextual understanding of cultural nuances - ✅ Preservation of cultural knowledge through AI ## 🚀 Quick Start ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac-mini") model = AutoModelForCausalLM.from_pretrained( "MWirelabs/neodac-mini", torch_dtype=torch.bfloat16, device_map="auto" ) # Example usage def ask_neodac-mini(question): prompt = f"user\n{question}\nmodel\n" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( **inputs, max_length=300, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) return response.split("model\n")[-1].strip() # Ask about Northeast India culture response = ask_neodac-mini("What is the significance of bamboo in Northeast India?") print(response) ``` ## 📊 Training Details ### Dataset - **Size**: 6,205 cultural Q&A pairs - **Sources**: Regional cultural databases, wiki content, expert curation - **Quality**: Manually verified for cultural authenticity - **Split**: 90% training, 10% validation ### Training Configuration - **Hardware**: NVIDIA A40 40GB - **Epochs**: 5 (enhanced from initial 3) - **Learning Rate**: 2e-5 (optimized for detailed responses) - **Batch Size**: 8 per device - **Precision**: bfloat16 - **Max Sequence Length**: 512 tokens ### Improvements Over Base Model | Aspect | Base Gemma 3 1B-IT | Neodac-mini | |--------|-------------------|---------| | Cultural Accuracy | ❌ Hallucinations | ✅ Factually correct | | Response Detail | ⚠️ Generic/brief | ✅ Rich & comprehensive | | Regional Context | ❌ Limited knowledge | ✅ Deep cultural understanding | | Tribal Information | ❌ Inaccurate/missing | ✅ Authentic representation | ## 🎪 Example Comparisons ### Question: "What is Bihu festival?" **Base Model Response:** > Claims Bihu is about Lord Shiva (incorrect) **Neodac-mini Response:** > Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter. ## 🎯 Use Cases ### Cultural Education - Educational institutions teaching Northeast India studies - Cultural preservation initiatives - Tourism and travel information ### Research & Documentation - Academic research on regional culture - Cultural anthropology studies - Digital heritage preservation ### Community Applications - Cultural chatbots for tourism - Educational tools for diaspora communities - Content creation for cultural media ## ⚠️ Limitations - **Geographic Scope**: Specialized for Northeast India only - **Language**: Responses in English (cultural terms may be in local languages) - **Temporal Knowledge**: Training data has knowledge cutoff - **Bias Inheritance**: May inherit biases from base model and training data ## 🔬 Evaluation & Performance The model was evaluated on cultural accuracy, response completeness, and factual correctness. Significant improvements were observed over the base model in all cultural domains. ## 📜 Citation If you use Neodac-mini in your research or applications, please cite: ```bibtex @misc{neodac2025, title={Neodac-mini: A Specialized Language Model for Northeast India Cultural Knowledge}, author={MWire Labs}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/MWirelabs/neodac-mini}, note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education} } ``` ## 🤝 Contributing Interested in improving Neodac-mini? We welcome: - Additional cultural data from Northeast India - Feedback on cultural accuracy - Suggestions for new cultural domains - Community validation of responses ## 📄 License This model is released under the Apache 2.0 license, same as the base Gemma model. ## 🙏 Acknowledgments - Google for the Gemma 3 1B-IT base model - Cultural experts and communities of Northeast India - Contributors to the cultural dataset - Hugging Face for the platform and tools --- *Neodac-mini represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.*