Pankaj8922/better-opus-mt-en-hi
Fine-tuned MarianMT model for English β Hindi translation. This model is trained on AI4Bharat's Samanantar dataset, which contains over 10 million high-quality parallel sentences.
π Model Details
- Base model:
Helsinki-NLP/opus-mt-en-hi - Fine-tuned on:
ai4bharat/samanantarEnglishβHindi subset - Total params: ~77M (MarianMT)
- Framework: Hugging Face Transformers
π Performance (BLEU / chrF on 500 samples from Namratap/En-Hindi)
| Domain | Base BLEU | Fine-tuned BLEU | Base chrF | Fine-tuned chrF |
|---|---|---|---|---|
| Healthcare | 15.54 | 27.95 | 38.06 | 54.09 |
| Gen News | 14.11 | 26.31 | 39.07 | 52.98 |
| Culture/Tourism | 12.76 | 18.49 | 35.07 | 41.32 |
| Education | 20.28 | 28.82 | 43.84 | 49.68 |
β
BLEU improvements of +8 to +13 points across domains
β
chrF boosts up to +16 points, reflecting better fluency and coverage
π§ Use Cases
- Book and news translation (Hindi)
- Offline/secure translation pipelines
- Domain-adapted fine-tuning
π Files Included
pytorch_model.binβ fine-tuned model weightsconfig.jsonβ model architecturetokenizer_config.json,vocab.json,source.spm,target.spmβ tokenizergeneration_config.jsonβ default decoding setup
βοΈ License
Apache 2.0 (Same as original model and Samanantar dataset)
- Downloads last month
- 10
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for AI4INDIANS/better-opus-mt-en-hi
Base model
Helsinki-NLP/opus-mt-en-hi