Spaces:

Daksh0505
/

Seq2Seq-LSTM-MultiHeadAttention-Translation

Sleeping

App Files Files Community

Seq2Seq-LSTM-MultiHeadAttention-Translation / README.md

Daksh0505

Update README.md

f62b1e9 verified about 2 months ago

preview code

raw

history blame contribute delete

3.01 kB

	---
	colorTo: indigo
	colorFrom: indigo
	emoji: 👁
	---
	# English → Hindi Translation with Seq2Seq + Multi-Head Attention

	This Streamlit Space demonstrates the power of LSTM with self-attention mechanisms for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases multi-head cross-attention in a translation setting.

	---

	## 🚀 Purpose

	This Space is designed to illustrate how LSTM-based Seq2Seq models combined with attention mechanisms can perform language translation. It is intended for educational and demonstration purposes, highlighting:

	- Encoder-Decoder architecture using LSTMs
	- Multi-head attention for better context understanding
	- Sequence-to-sequence translation from English to Hindi
	- Comparison between smaller (12M parameters) and larger (42M parameters) models

	---

	## 🧠 Models

	\| Model \| Parameters \| Vocabulary \| Training Data \| Repository \|
	\|-------\|------------\|-----------\|---------------\|------------\|
	\| Model A \| 12M \| 50k \| 20k rows \| [seq2seq-lstm-multiheadattention-12.3](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) \|
	\| Model B \| 42M \| 256k \| 100k rows \| [seq2seq-lstm-multiheadattention-42](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) \|

	- Model A performs better on small datasets it was trained on.
	- Model B has higher capacity but requires more diverse data to generalize well.

	---

	## 📋 Features

	- Select a model size (12M or 42M parameters)
	- View model architecture layer-by-layer
	- Choose a sentence from the dataset to translate
	- Compare original vs predicted translation
	- Highlight how multi-head attention improves Seq2Seq performance

	---

	## 🛠 How it Works

	1. Encoder:
	- Processes the input English sentence
	- Embedding → Layer Normalization → Dropout → BiLSTM → Hidden states

	2. Decoder:
	- Receives previous token embeddings and encoder states
	- Applies multi-head cross-attention over encoder outputs
	- Generates the next token until `<end>` token is reached

	3. Prediction:
	- Step-by-step decoding using trained weights
	- Output Hindi sentence is reconstructed token by token

	---

	## 💻 Usage

	1. Select the model size from the dropdown
	2. Expand Show Model Architecture to see layer details
	3. Select a sentence from the dataset
	4. Click Translate to view predicted Hindi translation

	---

	## ⚠️ Notes

	- Model performance depends on training data size and domain
	- Smaller model (12M) generalizes better on smaller datasets
	- Larger model (42M) requires more data and fine-tuning for small datasets

	---

	## 📚 References

	- Seq2Seq with Attention: [Bahdanau et al., 2014](https://arxiv.org/abs/1409.0473)
	- Multi-Head Attention: [Vaswani et al., 2017](https://arxiv.org/abs/1706.03762)

	---

	## 👨‍💻 Author

	Daksh Bhardwaj
	Email: dakshbhardwaj0505@gmail.com
	GitHub: [Daksh5555](https://github.com/daksh5555)