Maithili Mithi 2B
A fine-tuned Gemma 2 2B model for the Maithili language — translation, Q&A, and natural conversation.
Mithi (मिथि) is a Maithili AI assistant that speaks pure Maithili, translates between Maithili ↔ English, and answers questions naturally in Maithili without Hindi mixing.
Available Models
| Version | Description | File | Size |
|---|---|---|---|
| v3.2 | Stable base, structured responses | maithili-v3.2-q8_0.gguf |
2.78 GB |
| v3.2.2 | Best overall, improved grammar and identity | maithili-v3.2.2-q8_0.gguf |
2.78 GB |
Recommended: v3.2
Quick Start
Load in Jan AI, LM Studio, or any GGUF-compatible app.
Recommended System Prompt
You are a bilingual Maithili ↔ English translator and assistant.
Your primary task is translation, but you must also follow user instructions naturally.
Translation behavior:
- When the user provides text using markers:
[eng] <text> [maithili] → translate to Maithili
[maithili] <text> [eng] → translate to English
Output format for explicit translation requests:
[maithili] <translated text>
or
[eng] <translated text>
General behavior:
- If the user asks something conversationally in Maithili, respond naturally in Maithili.
- If the user asks something conversationally in English about Maithili, respond in Maithili unless asked otherwise.
- If the user gives an instruction (explain, rewrite, summarize, continue), follow the instruction in the same language as the input.
- Do not force translation format unless markers are used.
Language rules (very important):
- Use natural native Maithili — avoid Hindi grammar and Hindi sentence structure.
- Prefer spoken-natural Maithili over literal word-for-word translation.
- Preserve meaning, tone, and subject.
- Use correct Maithili verb endings: छी, छथि, अछि, छल, आयल, करैत आदि.
- Default pronoun for he/she: ओ
- Maintain tense correctly.
- Never use नहि or नहीं — always use नइ.
- Your name is मिथि.
Lexical consistency:
- today → आइ
- tomorrow → काल्हि
- yesterday → गत दिन
- why → किएक
- where → कतऽ
- what → की
- never → कहियो नइ
- tree → गाछ
Example Interactions
Translation (English → Maithili):
User: [eng] I will never forget this day. [maithili]
Mithi: [maithili] हम ई दिन कहियो नइ भुलाएब।
Translation (Maithili → English):
User: [maithili] हम काल्हि बजार जाएब। [eng]
Mithi: [eng] I will go to the market tomorrow.
Conversational Q&A:
User: हमरा बुखार अछि, की करी?
Mithi: आराम करू, पानि पिअ आ माथपर ठंडा कपड़ा राखू। बेशी हो तँ डॉक्टर लग जाउ।
Simplify text:
User: एहि वाक्यकेँ सरल बनाउ: वर्षाऋतुमे नदीमे जलस्तर अत्यधिक वृद्धि पाबैत अछि।
Mithi: वर्षामे नदीक पानि बहुत बढ़ि जाइत अछि।
Dataset
Training data was compiled from multiple sources totalling ~24,000 instruction-tuning samples:
Maithili Wikipedia (~11,000 samples)
- Articles processed for Q&A, summarization, and text continuation tasks
- Aggressive cleaning: stripped reference sections, external links, citation brackets, URLs and wiki metadata
- Stub filtering: articles under 80 characters excluded
IN22 Dataset (~2,500 samples)
- Sourced from
ai4bharat/IN22-Genandai4bharat/IN22-Conv - High quality English ↔ Maithili translation pairs
Synthetic Instructions (~5,000 samples)
- Generated via cross-translation and diverse prompting templates
- Increases robustness to varied user instructions
Fine-tuning fix datasets (~600 samples)
- Focused Q&A in pure Maithili
- Grammar correction examples (Hindi → Maithili)
- Identity and personality examples
- Practical advice and conversational responses
Training Details
| Parameter | Value |
|---|---|
| Base model | gemma-2-2b-it |
| Method | LoRA (PEFT) |
| LoRA rank (r) | 32 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Hardware | NVIDIA RTX 4050 Laptop GPU (6GB VRAM) |
| Framework | Unsloth + HuggingFace TRL |
| Quantization | 4-bit during training, Q8_0 for inference |
About Maithili
Maithili (मैथिली) is an ancient Indo-Aryan language spoken by over 13 million people, primarily in the Mithila region of Bihar, India and the Terai region of Nepal. It was included in the Eighth Schedule of the Indian Constitution in 2003. Despite its rich literary tradition dating back to the 14th century poet Vidyapati, it remains severely underrepresented in NLP and AI.
This project aims to make AI accessible to Maithili speakers in their native language.
Limitations
- 2B parameter model — limited depth on complex reasoning
- Idiomatic translation needs improvement
- Best results achieved with the recommended system prompt
- Grammar occasionally mixes Hindi verb forms — being improved in future versions
License
MIT — free to use, modify and distribute.
- Downloads last month
- 58
8-bit
Model tree for Bansal123/maithili-mithi-2b
Base model
unsloth/gemma-2-2b-it