LSTM Seq2Seq Model for Translation
This repository contains the implementation of an LSTM-based Seq2Seq model for translation tasks. The model has been trained on a bilingual dataset and evaluated using BLEU and ChrF scores to measure translation quality.
Model Architecture
The model is a Seq2Seq architecture that uses:
- Embedding Layer: To convert input tokens into dense vectors.
- LSTM Encoder: To encode the source language sequences into a hidden representation.
- LSTM Decoder: To generate the translated target language sequences from the hidden representation.
- Linear Layer: To map the decoder output to the target vocabulary space.
Training Details
- Training Loss: Cross-entropy loss with padding tokens ignored.
- Optimizer: Adam optimizer with a learning rate of 0.001.
- Number of Epochs: 10 epochs.
- Batch Size: 32.
Evaluation Metrics
The model's performance was evaluated using:
- BLEU Score: A metric to measure the similarity between the generated and reference translations.
- ChrF Score: A character-based metric for evaluating translation quality.
Results
The training and validation loss, along with BLEU and ChrF scores, were plotted to analyze the model's performance:
- Training Loss: Decreased steadily over the epochs, indicating effective learning.
- Validation Loss: Showed minimal improvement, suggesting potential overfitting.
- BLEU Score: Improved gradually but remained relatively low, indicating that further tuning may be needed.
- ChrF Score: Showed a consistent increase, reflecting better character-level accuracy in translations.
Files Included
- LSTM_model.ipynb: The Jupyter notebook containing the full implementation of the model, including data loading, training, and evaluation.
- bleu_scores.csv: CSV file containing BLEU scores for each epoch.
- chrf_scores.csv: CSV file containing ChrF scores for each epoch.
- loss_plot.png: Plot of training and validation loss.
- bleu_score_plot.png: Plot of BLEU scores over epochs.
- chrf_score_plot.png: Plot of ChrF scores over epochs.
Future Work
- Hyperparameter Tuning: Experiment with different hyperparameters to improve model performance.
- Data Augmentation: Use data augmentation techniques to improve the model's ability to generalize.
- Advanced Architectures: Consider using attention mechanisms or transformer models for better performance.
License
This project is licensed under the MIT License. See the LICENSE file for more details.