File size: 2,574 Bytes
99c0d90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# LSTM Seq2Seq Model for Translation

This repository contains the implementation of an LSTM-based Seq2Seq model for translation tasks. The model has been trained on a bilingual dataset and evaluated using BLEU and ChrF scores to measure translation quality.

## Model Architecture

The model is a Seq2Seq architecture that uses:
- **Embedding Layer**: To convert input tokens into dense vectors.
- **LSTM Encoder**: To encode the source language sequences into a hidden representation.
- **LSTM Decoder**: To generate the translated target language sequences from the hidden representation.
- **Linear Layer**: To map the decoder output to the target vocabulary space.

## Training Details

- **Training Loss**: Cross-entropy loss with padding tokens ignored.
- **Optimizer**: Adam optimizer with a learning rate of 0.001.
- **Number of Epochs**: 10 epochs.
- **Batch Size**: 32.

## Evaluation Metrics

The model's performance was evaluated using:
- **BLEU Score**: A metric to measure the similarity between the generated and reference translations.
- **ChrF Score**: A character-based metric for evaluating translation quality.

## Results

The training and validation loss, along with BLEU and ChrF scores, were plotted to analyze the model's performance:
- **Training Loss**: Decreased steadily over the epochs, indicating effective learning.
- **Validation Loss**: Showed minimal improvement, suggesting potential overfitting.
- **BLEU Score**: Improved gradually but remained relatively low, indicating that further tuning may be needed.
- **ChrF Score**: Showed a consistent increase, reflecting better character-level accuracy in translations.


## Files Included

- **LSTM_model.ipynb**: The Jupyter notebook containing the full implementation of the model, including data loading, training, and evaluation.
- **bleu_scores.csv**: CSV file containing BLEU scores for each epoch.
- **chrf_scores.csv**: CSV file containing ChrF scores for each epoch.
- **loss_plot.png**: Plot of training and validation loss.
- **bleu_score_plot.png**: Plot of BLEU scores over epochs.
- **chrf_score_plot.png**: Plot of ChrF scores over epochs.

## Future Work

- **Hyperparameter Tuning**: Experiment with different hyperparameters to improve model performance.
- **Data Augmentation**: Use data augmentation techniques to improve the model's ability to generalize.
- **Advanced Architectures**: Consider using attention mechanisms or transformer models for better performance.

## License

This project is licensed under the MIT License. See the LICENSE file for more details.