YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Seq2Seq Translation Model Using LSTM
This repository contains a Seq2Seq translation model built using LSTM networks. The model is designed for translating between two languages and has been evaluated using BLEU and ChrF metrics to measure translation quality.
Model Overview
The Seq2Seq model architecture includes:
- Embedding Layer: Converts input tokens into dense vector representations.
- LSTM Encoder: Encodes the source language sequences into a context-aware hidden representation.
- LSTM Decoder: Decodes the hidden representation back into the target language.
- Linear Layer: Maps the decoder outputs to the target vocabulary, producing predictions for each token.
Training Details
- Loss Function: Cross-entropy loss with the padding token ignored to focus only on meaningful tokens.
- Optimizer: Adam optimizer with a learning rate of 0.001.
- Number of Epochs: 10
- Batch Size: 32
Evaluation Metrics
The performance of the model is evaluated using:
- BLEU Score: A metric for comparing the similarity of generated translations to reference translations using n-gram precision.
- ChrF Score: A character-level metric that evaluates translation accuracy and is more sensitive to morphological variations.
Results
Training and Validation Loss
- The training loss steadily decreased over the course of training, indicating that the model was learning effectively.
- The validation loss remained relatively stable, suggesting that the model might be overfitting to the training data.
BLEU and ChrF Scores
- BLEU Score: Improved gradually but remained low, indicating that there is room for further improvement in translation quality.
- ChrF Score: Showed consistent improvement, suggesting better character-level accuracy in the model’s translations.
Files Included
- Seq2Seq_model.ipynb: Jupyter notebook with the full implementation of the Seq2Seq model, including data loading, model training, and evaluation.
- bleu_scores.csv: CSV file containing BLEU scores for each epoch.
- chrf_scores.csv: CSV file containing ChrF scores for each epoch.
- loss_plot.png: Visualization of training and validation loss over epochs.
- bleu_score_plot.png: Plot of BLEU scores over the training epochs.
- chrf_score_plot.png: Plot of ChrF scores over the training epochs.
Future Work
- Incorporate Attention Mechanisms: Integrating attention layers could improve translation quality.
- Use Transformers: Experiment with transformer-based models for better performance.
- Hyperparameter Tuning: Further fine-tune the model’s hyperparameters to optimize performance.
- Data Augmentation: Explore data augmentation techniques to diversify the training data and improve model robustness.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support