NanduVardhanreddy commited on
Commit
3eae388
Β·
verified Β·
1 Parent(s): 6f82a1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -3
README.md CHANGED
@@ -1,3 +1,114 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ {Translator Project using LSTM and Seq2Seq Models
3
+ Table of Contents
4
+ Project Overview
5
+ Dataset
6
+ Model Architectures
7
+ 1.⁠ ⁠LSTM-based Model
8
+ 2.⁠ ⁠Seq2Seq Model
9
+ Evaluation Metrics
10
+ Results
11
+ Training Curves
12
+ BLEU and CHRF Scores
13
+ Installation and Setup
14
+ How to Run
15
+ File Structure
16
+ Future Enhancements
17
+ Acknowledgments
18
+ Project Overview
19
+ This project involves building translation models to translate text between English and Assamese using two different neural network architectures:
20
+
21
+ LSTM-based model
22
+ Seq2Seq model (without attention)
23
+ The primary objective is to train models that can translate between the two languages and evaluate their performance using metrics like BLEU and CHRF scores.
24
+
25
+ Dataset
26
+ The project uses two datasets:
27
+ English dataset (alpaca_cleaned.json)
28
+ Assamese dataset (Assamese.json)
29
+ The datasets contain parallel text data with the structure:
30
+ instruction, input, and output fields.
31
+ The input field is used as the source sentence and the output field as the target sentence.
32
+ Model Architectures
33
+ 1.⁠ ⁠LSTM-based Model
34
+ The LSTM model uses:
35
+ An embedding layer for token representations.
36
+ A stacked LSTM layer to capture sequential dependencies.
37
+ A fully connected layer to generate token predictions.
38
+ The model was trained using CrossEntropyLoss and the Adam optimizer.
39
+ 2.⁠ ⁠Seq2Seq Model
40
+ The Seq2Seq model is implemented with:
41
+ An embedding layer.
42
+ An encoder-decoder LSTM architecture without attention.
43
+ The encoder processes the source sequence, and the decoder generates the target sequence.
44
+ This model is also trained using CrossEntropyLoss with the Adam optimizer.
45
+ Evaluation Metrics
46
+ The models are evaluated using:
47
+
48
+ BLEU Score: Measures the overlap between predicted and reference translations.
49
+ CHRF Score: Evaluates character-level matches between predictions and references, useful for morphologically rich languages.
50
+ Results
51
+ Training Curves
52
+ The training and validation loss curves for both models are plotted to monitor the convergence.
53
+
54
+ BLEU and CHRF Scores
55
+ The models were evaluated using at least 1000 data points for sentence-level BLEU and CHRF scores.
56
+ The scores are saved into CSV files:
57
+ bleu_scores_lstm.csv
58
+ bleu_scores_seq2seq.csv
59
+ chrf_scores_lstm.csv
60
+ chrf_scores_seq2seq.csv
61
+ Sample Results:
62
+ Model Average BLEU Score Average CHRF Score
63
+ LSTM-based 0.45 0.67
64
+ Seq2Seq 0.52 0.70
65
+ Installation and Setup
66
+ Prerequisites
67
+ Make sure you have the following installed:
68
+
69
+ Python 3.x
70
+ Google Colab or Jupyter Notebook
71
+ Libraries: torch, transformers, evaluate, pandas, matplotlib
72
+ Installation
73
+ To install the required packages, run:
74
+
75
+ bash
76
+ Copy code
77
+ pip install torch transformers evaluate matplotlib pandas
78
+ How to Run
79
+ Clone the Repository:
80
+
81
+ bash
82
+ Copy code
83
+ git clone <repository-link>
84
+ cd <repository-folder>
85
+ Upload Data: Ensure the Assamese.json and alpaca_cleaned.json files are in the appropriate directory.
86
+
87
+ Run the Notebooks:
88
+
89
+ Use the provided code in Google Colab or Jupyter Notebook.
90
+ For LSTM-based model: lstm_model.ipynb
91
+ For Seq2Seq model: seq2seq_model.ipynb
92
+ Generate BLEU and CHRF Scores:
93
+
94
+ The script will generate predictions and save the scores in CSV files.
95
+ File Structure
96
+ Copy code
97
+ project-root/
98
+ β”œβ”€β”€ Assamese.json
99
+ β”œβ”€β”€ alpaca_cleaned.json
100
+ β”œβ”€β”€ lstm_model.ipynb
101
+ β”œβ”€β”€ seq2seq_model.ipynb
102
+ β”œβ”€β”€ bleu_scores_lstm.csv
103
+ β”œβ”€β”€ bleu_scores_seq2seq.csv
104
+ β”œβ”€β”€ chrf_scores_lstm.csv
105
+ β”œβ”€β”€ chrf_scores_seq2seq.csv
106
+ β”œβ”€β”€ README.md
107
+ Future Enhancements
108
+ Implement attention mechanisms to improve translation quality.
109
+ Experiment with transformer models for better performance.
110
+ Optimize the models for faster inference using techniques like quantization.
111
+ Acknowledgments
112
+ Hugging Face for providing easy-to-use NLP evaluation metrics.
113
+ University of New Haven for guidance and support throughout the project.
114
+ The creators of the datasets used for training and evaluation.}