Seq2Seq Translation Model Using LSTM

This repository contains a Seq2Seq translation model built using LSTM networks. The model is designed for translating between two languages and has been evaluated using BLEU and ChrF metrics to measure translation quality.

Model Overview

The Seq2Seq model architecture includes:

Embedding Layer: Converts input tokens into dense vector representations.
LSTM Encoder: Encodes the source language sequences into a context-aware hidden representation.
LSTM Decoder: Decodes the hidden representation back into the target language.
Linear Layer: Maps the decoder outputs to the target vocabulary, producing predictions for each token.

Training Details

Loss Function: Cross-entropy loss with the padding token ignored to focus only on meaningful tokens.
Optimizer: Adam optimizer with a learning rate of 0.001.
Number of Epochs: 10
Batch Size: 32

Evaluation Metrics

The performance of the model is evaluated using:

BLEU Score: A metric for comparing the similarity of generated translations to reference translations using n-gram precision.
ChrF Score: A character-level metric that evaluates translation accuracy and is more sensitive to morphological variations.

Results

Training and Validation Loss

The training loss steadily decreased over the course of training, indicating that the model was learning effectively.
The validation loss remained relatively stable, suggesting that the model might be overfitting to the training data.

BLEU and ChrF Scores

BLEU Score: Improved gradually but remained low, indicating that there is room for further improvement in translation quality.
ChrF Score: Showed consistent improvement, suggesting better character-level accuracy in the model’s translations.

Files Included

Seq2Seq_model.ipynb: Jupyter notebook with the full implementation of the Seq2Seq model, including data loading, model training, and evaluation.
bleu_scores.csv: CSV file containing BLEU scores for each epoch.
chrf_scores.csv: CSV file containing ChrF scores for each epoch.
loss_plot.png: Visualization of training and validation loss over epochs.
bleu_score_plot.png: Plot of BLEU scores over the training epochs.
chrf_score_plot.png: Plot of ChrF scores over the training epochs.

Future Work

Incorporate Attention Mechanisms: Integrating attention layers could improve translation quality.
Use Transformers: Experiment with transformer-based models for better performance.
Hyperparameter Tuning: Further fine-tune the model’s hyperparameters to optimize performance.
Data Augmentation: Explore data augmentation techniques to diversify the training data and improve model robustness.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support