|
|
--- |
|
|
colorTo: indigo |
|
|
colorFrom: indigo |
|
|
emoji: π |
|
|
--- |
|
|
# English β Hindi Translation with Seq2Seq + Multi-Head Attention |
|
|
|
|
|
This Streamlit Space demonstrates the **power of LSTM with self-attention mechanisms** for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases **multi-head cross-attention** in a translation setting. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Purpose |
|
|
|
|
|
This Space is designed to **illustrate how LSTM-based Seq2Seq models combined with attention mechanisms** can perform language translation. It is intended for educational and demonstration purposes, highlighting: |
|
|
|
|
|
- Encoder-Decoder architecture using LSTMs |
|
|
- Multi-head attention for better context understanding |
|
|
- Sequence-to-sequence translation from English to Hindi |
|
|
- Comparison between **smaller (12M parameters)** and **larger (42M parameters)** models |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Models |
|
|
|
|
|
| Model | Parameters | Vocabulary | Training Data | Repository | |
|
|
|-------|------------|-----------|---------------|------------| |
|
|
| Model A | 12M | 50k | 20k rows | [seq2seq-lstm-multiheadattention-12.3](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) | |
|
|
| Model B | 42M | 256k | 100k rows | [seq2seq-lstm-multiheadattention-42](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) | |
|
|
|
|
|
- **Model A** performs better on small datasets it was trained on. |
|
|
- **Model B** has higher capacity but requires more diverse data to generalize well. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Features |
|
|
|
|
|
- Select a model size (12M or 42M parameters) |
|
|
- View **model architecture** layer-by-layer |
|
|
- Choose a sentence from the dataset to translate |
|
|
- Compare **original vs predicted translation** |
|
|
- Highlight how multi-head attention improves Seq2Seq performance |
|
|
|
|
|
--- |
|
|
|
|
|
## π How it Works |
|
|
|
|
|
1. **Encoder**: |
|
|
- Processes the input English sentence |
|
|
- Embedding β Layer Normalization β Dropout β BiLSTM β Hidden states |
|
|
|
|
|
2. **Decoder**: |
|
|
- Receives previous token embeddings and encoder states |
|
|
- Applies multi-head cross-attention over encoder outputs |
|
|
- Generates the next token until `<end>` token is reached |
|
|
|
|
|
3. **Prediction**: |
|
|
- Step-by-step decoding using trained weights |
|
|
- Output Hindi sentence is reconstructed token by token |
|
|
|
|
|
--- |
|
|
|
|
|
## π» Usage |
|
|
|
|
|
1. Select the model size from the dropdown |
|
|
2. Expand **Show Model Architecture** to see layer details |
|
|
3. Select a sentence from the dataset |
|
|
4. Click **Translate** to view predicted Hindi translation |
|
|
|
|
|
--- |
|
|
|
|
|
## β οΈ Notes |
|
|
|
|
|
- Model performance depends on **training data size and domain** |
|
|
- Smaller model (12M) generalizes better on smaller datasets |
|
|
- Larger model (42M) requires **more data** and **fine-tuning** for small datasets |
|
|
|
|
|
--- |
|
|
|
|
|
## π References |
|
|
|
|
|
- **Seq2Seq with Attention**: [Bahdanau et al., 2014](https://arxiv.org/abs/1409.0473) |
|
|
- **Multi-Head Attention**: [Vaswani et al., 2017](https://arxiv.org/abs/1706.03762) |
|
|
|
|
|
--- |
|
|
|
|
|
## π¨βπ» Author |
|
|
|
|
|
Daksh Bhardwaj |
|
|
Email: dakshbhardwaj0505@gmail.com |
|
|
GitHub: [Daksh5555](https://github.com/daksh5555) |