🇮🇳 Kurukh (Oraon) to Hindi Translator
This is a sequence-to-sequence transformer model designed to translate the low-resource Kurukh (Oraon) language into Hindi. It has been fine-tuned on the Google mT5-small architecture using a custom dataset of approximately 10,000 sentence pairs.
📊 Model Details
- Model Architecture: Google mT5-small (Multilingual T5)
- Task: Machine Translation (Kurukh → Hindi)
- Script: Devanagari (For both Input and Output)
- Final Training Loss: 1.64
- Developer: Ankit Lakra
🚀 Live Demo
You can try this model directly using the interactive demo:
👉 Kurukh Translator (Live Space)
📚 Training Data
The model was trained on a diverse dataset aggregated from multiple sources to ensure grammatical generalization:
- Classic Literature: Parallel sentences extracted from historical Kurukh texts and grammar books.
- Dictionaries: Vocabulary and phrases from the Bharatavani Government Dictionary and other lexical resources.
- Community Data: Spoken sentences, folk songs, and daily conversation logs collected from community resources.
⚙️ Training Procedure
The model was trained using Hugging Face Transformers on a T4 GPU.
- Optimizer: Adafactor
- Batch Size: 16
- Learning Rate: 3e-4
- Epochs: 20
- Precision: Mixed Precision (FP16) disabled for stability.
- Strategy: Aggressive fine-tuning with gradient accumulation.
⚠️ Limitations & Bias
- Formal vs. Informal: The model performs well on standard, formal Kurukh but might struggle with very casual internet slang or regional variations not present in standard literature.
- Script: The model strictly expects Devanagari script. Romanized Kurukh (e.g., "Ninghai Name") must be transliterated to Devanagari before input.
📜 License
This model is released under the Apache 2.0 License.
💻 How to use
from transformers import pipeline
# 1. Load the Model
translator = pipeline("text2text-generation", model="ankitklakra/kurukh-to-hindi")
# 2. Translate a Sentence
# Input: "निघै नामे इन्द्रा हिकै?" (What is your name?)
text = "निघै नामे इन्द्रा हिकै?"
result = translator(text, max_length=128)
# 3. Print Result
print(result[0]['generated_text'])
# Expected Output: "तुम्हारा नाम क्या है?"
- Downloads last month
- 14
Model tree for ankitklakra/kurukh-to-hindi
Base model
google/mt5-small