🇮🇳 Kurukh (Oraon) to Hindi Translator

This is a sequence-to-sequence transformer model designed to translate the low-resource Kurukh (Oraon) language into Hindi. It has been fine-tuned on the Google mT5-small architecture using a custom dataset of approximately 10,000 sentence pairs.

📊 Model Details

  • Model Architecture: Google mT5-small (Multilingual T5)
  • Task: Machine Translation (Kurukh → Hindi)
  • Script: Devanagari (For both Input and Output)
  • Final Training Loss: 1.64
  • Developer: Ankit Lakra

🚀 Live Demo

You can try this model directly using the interactive demo:

👉 Kurukh Translator (Live Space)

📚 Training Data

The model was trained on a diverse dataset aggregated from multiple sources to ensure grammatical generalization:

  1. Classic Literature: Parallel sentences extracted from historical Kurukh texts and grammar books.
  2. Dictionaries: Vocabulary and phrases from the Bharatavani Government Dictionary and other lexical resources.
  3. Community Data: Spoken sentences, folk songs, and daily conversation logs collected from community resources.

⚙️ Training Procedure

The model was trained using Hugging Face Transformers on a T4 GPU.

  • Optimizer: Adafactor
  • Batch Size: 16
  • Learning Rate: 3e-4
  • Epochs: 20
  • Precision: Mixed Precision (FP16) disabled for stability.
  • Strategy: Aggressive fine-tuning with gradient accumulation.

⚠️ Limitations & Bias

  • Formal vs. Informal: The model performs well on standard, formal Kurukh but might struggle with very casual internet slang or regional variations not present in standard literature.
  • Script: The model strictly expects Devanagari script. Romanized Kurukh (e.g., "Ninghai Name") must be transliterated to Devanagari before input.

📜 License

This model is released under the Apache 2.0 License.


💻 How to use

from transformers import pipeline

# 1. Load the Model
translator = pipeline("text2text-generation", model="ankitklakra/kurukh-to-hindi")

# 2. Translate a Sentence
# Input: "निघै नामे इन्द्रा हिकै?" (What is your name?)
text = "निघै नामे इन्द्रा हिकै?"
result = translator(text, max_length=128)

# 3. Print Result
print(result[0]['generated_text'])
# Expected Output: "तुम्हारा नाम क्या है?"
Downloads last month
14
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ankitklakra/kurukh-to-hindi

Base model

google/mt5-small
Finetuned
(666)
this model

Space using ankitklakra/kurukh-to-hindi 1