Daksh0505's picture
Update README.md
f62b1e9 verified
---
colorTo: indigo
colorFrom: indigo
emoji: πŸ‘
---
# English β†’ Hindi Translation with Seq2Seq + Multi-Head Attention
This Streamlit Space demonstrates the **power of LSTM with self-attention mechanisms** for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases **multi-head cross-attention** in a translation setting.
---
## πŸš€ Purpose
This Space is designed to **illustrate how LSTM-based Seq2Seq models combined with attention mechanisms** can perform language translation. It is intended for educational and demonstration purposes, highlighting:
- Encoder-Decoder architecture using LSTMs
- Multi-head attention for better context understanding
- Sequence-to-sequence translation from English to Hindi
- Comparison between **smaller (12M parameters)** and **larger (42M parameters)** models
---
## 🧠 Models
| Model | Parameters | Vocabulary | Training Data | Repository |
|-------|------------|-----------|---------------|------------|
| Model A | 12M | 50k | 20k rows | [seq2seq-lstm-multiheadattention-12.3](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |
| Model B | 42M | 256k | 100k rows | [seq2seq-lstm-multiheadattention-42](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |
- **Model A** performs better on small datasets it was trained on.
- **Model B** has higher capacity but requires more diverse data to generalize well.
---
## πŸ“‹ Features
- Select a model size (12M or 42M parameters)
- View **model architecture** layer-by-layer
- Choose a sentence from the dataset to translate
- Compare **original vs predicted translation**
- Highlight how multi-head attention improves Seq2Seq performance
---
## πŸ›  How it Works
1. **Encoder**:
- Processes the input English sentence
- Embedding β†’ Layer Normalization β†’ Dropout β†’ BiLSTM β†’ Hidden states
2. **Decoder**:
- Receives previous token embeddings and encoder states
- Applies multi-head cross-attention over encoder outputs
- Generates the next token until `<end>` token is reached
3. **Prediction**:
- Step-by-step decoding using trained weights
- Output Hindi sentence is reconstructed token by token
---
## πŸ’» Usage
1. Select the model size from the dropdown
2. Expand **Show Model Architecture** to see layer details
3. Select a sentence from the dataset
4. Click **Translate** to view predicted Hindi translation
---
## ⚠️ Notes
- Model performance depends on **training data size and domain**
- Smaller model (12M) generalizes better on smaller datasets
- Larger model (42M) requires **more data** and **fine-tuning** for small datasets
---
## πŸ“š References
- **Seq2Seq with Attention**: [Bahdanau et al., 2014](https://arxiv.org/abs/1409.0473)
- **Multi-Head Attention**: [Vaswani et al., 2017](https://arxiv.org/abs/1706.03762)
---
## πŸ‘¨β€πŸ’» Author
Daksh Bhardwaj
Email: dakshbhardwaj0505@gmail.com
GitHub: [Daksh5555](https://github.com/daksh5555)