| # LSTM Seq2Seq Model for Translation | |
| This repository contains the implementation of an LSTM-based Seq2Seq model for translation tasks. The model has been trained on a bilingual dataset and evaluated using BLEU and ChrF scores to measure translation quality. | |
| ## Model Architecture | |
| The model is a Seq2Seq architecture that uses: | |
| - **Embedding Layer**: To convert input tokens into dense vectors. | |
| - **LSTM Encoder**: To encode the source language sequences into a hidden representation. | |
| - **LSTM Decoder**: To generate the translated target language sequences from the hidden representation. | |
| - **Linear Layer**: To map the decoder output to the target vocabulary space. | |
| ## Training Details | |
| - **Training Loss**: Cross-entropy loss with padding tokens ignored. | |
| - **Optimizer**: Adam optimizer with a learning rate of 0.001. | |
| - **Number of Epochs**: 10 epochs. | |
| - **Batch Size**: 32. | |
| ## Evaluation Metrics | |
| The model's performance was evaluated using: | |
| - **BLEU Score**: A metric to measure the similarity between the generated and reference translations. | |
| - **ChrF Score**: A character-based metric for evaluating translation quality. | |
| ## Results | |
| The training and validation loss, along with BLEU and ChrF scores, were plotted to analyze the model's performance: | |
| - **Training Loss**: Decreased steadily over the epochs, indicating effective learning. | |
| - **Validation Loss**: Showed minimal improvement, suggesting potential overfitting. | |
| - **BLEU Score**: Improved gradually but remained relatively low, indicating that further tuning may be needed. | |
| - **ChrF Score**: Showed a consistent increase, reflecting better character-level accuracy in translations. | |
| ## Files Included | |
| - **LSTM_model.ipynb**: The Jupyter notebook containing the full implementation of the model, including data loading, training, and evaluation. | |
| - **bleu_scores.csv**: CSV file containing BLEU scores for each epoch. | |
| - **chrf_scores.csv**: CSV file containing ChrF scores for each epoch. | |
| - **loss_plot.png**: Plot of training and validation loss. | |
| - **bleu_score_plot.png**: Plot of BLEU scores over epochs. | |
| - **chrf_score_plot.png**: Plot of ChrF scores over epochs. | |
| ## Future Work | |
| - **Hyperparameter Tuning**: Experiment with different hyperparameters to improve model performance. | |
| - **Data Augmentation**: Use data augmentation techniques to improve the model's ability to generalize. | |
| - **Advanced Architectures**: Consider using attention mechanisms or transformer models for better performance. | |
| ## License | |
| This project is licensed under the MIT License. See the LICENSE file for more details. | |