icfoss
/

Malayalam-Hindi-Translation-Model-Fairseq

@@ -16,7 +16,7 @@ tags:
 - multilingual
 - sequence-to-sequence
 ---
-## 📝 `README.md` for Hugging Face Model Card
 ````markdown
 ---
@@ -58,15 +58,15 @@ model-index:
             value: 0.76
 ---
-# 🇮🇳 Malayalam ↔ Hindi Translation Model (Fairseq)
 This is a **Neural Machine Translation (NMT)** model trained to translate between **Malayalam (ml)** and **Hindi (hi)** using the **Fairseq** framework. It was trained on a custom curated low-resource parallel corpus.
-## 🧠 Model Architecture
-- Framework: **Fairseq (PyTorch)**
-- Architecture: **Transformer**
-- Type: **Sequence-to-sequence**
 - Layers: 6 encoder / 6 decoder
 - Embedding size: 512
 - FFN size: 2048
@@ -75,7 +75,7 @@ This is a **Neural Machine Translation (NMT)** model trained to translate betwee
 - Tokenizer: SentencePiece (trained jointly on ml-hi)
 - Vocabulary size: 32,000 (joint BPE)
-## 📊 Training Details
 | Setting              | Value                  |
 |----------------------|------------------------|
@@ -89,7 +89,7 @@ This is a **Neural Machine Translation (NMT)** model trained to translate betwee
 | Hardware             | 1 x V100 32GB GPU      |
 | Training time        | ~16 hours              |
-## 🧪 Evaluation
 The model was evaluated on a manually annotated Malayalam-Hindi test set consisting of 10,000 sentence pairs.
@@ -100,9 +100,9 @@ The model was evaluated on a manually annotated Malayalam-Hindi test set consist
 | BLEU      | 11.08   | 29.56   |
 | COMET     | 0.76    | 0.62    |
-## 📥 Usage
-### 🔧 In Fairseq (CLI)
 ```bash
 fairseq-interactive /data-bin \
@@ -121,7 +121,7 @@ fairseq-interactive /data-bin \
 ````
-### 🐍 In Python (Torch-based loading)
 ```python
 import torch
@@ -134,37 +134,10 @@ model.eval()
 > Note: To use this model effectively, you need the SentencePiece model (`spm.model`) and the exact Fairseq dictionary files (`dict.ml.txt`, `dict.hi.txt`).
-## 📚 Dataset
 This model was trained on a custom dataset compiled from:
 * [AI4Bharat OPUS Corpus](https://github.com/AI4Bharat/IndicTrans)
 * Manually aligned Malayalam-Hindi sentences from news and educational data
 * Crawled parallel content from Indian government websites (under open license)
-Preprocessing was done with:
-* Normalization
-* Language ID filtering
-* Sentence length and alignment heuristics
-## 🔐 License
-## 🤝 Citation
-```
-@misc{malayalam-hindi-nmt,
-  author = {Navaneeth Sreedharan ,  Sneha S, Renimol V R},
-  title = {Malayalam-Hindi Neural Machine Translation using Fairseq},
-  year = {2025},
-  howpublished = {\url{https://huggingface.co/icfoss/Malayalam-Hindi-Translation-Model-fairseq}}
-}
-```
-## 🙋‍♀️ Contact / Contributions
-For queries or collaboration, contact `navaneeth@icfoss.com`. Contributions are welcome via pull requests or issues.
-```

 - multilingual
 - sequence-to-sequence
 ---
+`README.md` for Hugging Face Model Card
 ````markdown
 ---
             value: 0.76
 ---
+Malayalam ↔ Hindi Translation Model (Fairseq)
 This is a **Neural Machine Translation (NMT)** model trained to translate between **Malayalam (ml)** and **Hindi (hi)** using the **Fairseq** framework. It was trained on a custom curated low-resource parallel corpus.
+Model Architecture
+- Framework: Fairseq (PyTorch)
+- Architecture: Transformer
+- Type: Sequence-to-sequence
 - Layers: 6 encoder / 6 decoder
 - Embedding size: 512
 - FFN size: 2048
 - Tokenizer: SentencePiece (trained jointly on ml-hi)
 - Vocabulary size: 32,000 (joint BPE)
+Training Details
 | Setting              | Value                  |
 |----------------------|------------------------|
 | Hardware             | 1 x V100 32GB GPU      |
 | Training time        | ~16 hours              |
+Evaluation
 The model was evaluated on a manually annotated Malayalam-Hindi test set consisting of 10,000 sentence pairs.
 | BLEU      | 11.08   | 29.56   |
 | COMET     | 0.76    | 0.62    |
+Usage
+In Fairseq (CLI)
 ```bash
 fairseq-interactive /data-bin \
 ````
+In Python (Torch-based loading)
 ```python
 import torch
 > Note: To use this model effectively, you need the SentencePiece model (`spm.model`) and the exact Fairseq dictionary files (`dict.ml.txt`, `dict.hi.txt`).
+Dataset
 This model was trained on a custom dataset compiled from:
 * [AI4Bharat OPUS Corpus](https://github.com/AI4Bharat/IndicTrans)
 * Manually aligned Malayalam-Hindi sentences from news and educational data
 * Crawled parallel content from Indian government websites (under open license)