NanduVardhanreddy
/

LSTM_Seq2Seq_Model_Classes

Model card Files Files and versions

xet

Community

NanduVardhanreddy commited on Nov 16, 2024

Commit

3eae388

verified ·

1 Parent(s): 6f82a1a

Update README.md

Browse files

Files changed (1) hide show

README.md +114 -3

README.md CHANGED Viewed

@@ -1,3 +1,114 @@
----
-license: mit
----

+{Translator Project using LSTM and Seq2Seq Models
+Table of Contents
+Project Overview
+Dataset
+Model Architectures
+1.⁠ ⁠LSTM-based Model
+2.⁠ ⁠Seq2Seq Model
+Evaluation Metrics
+Results
+Training Curves
+BLEU and CHRF Scores
+Installation and Setup
+How to Run
+File Structure
+Future Enhancements
+Acknowledgments
+Project Overview
+This project involves building translation models to translate text between English and Assamese using two different neural network architectures:
+LSTM-based model
+Seq2Seq model (without attention)
+The primary objective is to train models that can translate between the two languages and evaluate their performance using metrics like BLEU and CHRF scores.
+Dataset
+The project uses two datasets:
+English dataset (alpaca_cleaned.json)
+Assamese dataset (Assamese.json)
+The datasets contain parallel text data with the structure:
+instruction, input, and output fields.
+The input field is used as the source sentence and the output field as the target sentence.
+Model Architectures
+1.⁠ ⁠LSTM-based Model
+The LSTM model uses:
+An embedding layer for token representations.
+A stacked LSTM layer to capture sequential dependencies.
+A fully connected layer to generate token predictions.
+The model was trained using CrossEntropyLoss and the Adam optimizer.
+2.⁠ ⁠Seq2Seq Model
+The Seq2Seq model is implemented with:
+An embedding layer.
+An encoder-decoder LSTM architecture without attention.
+The encoder processes the source sequence, and the decoder generates the target sequence.
+This model is also trained using CrossEntropyLoss with the Adam optimizer.
+Evaluation Metrics
+The models are evaluated using:
+BLEU Score: Measures the overlap between predicted and reference translations.
+CHRF Score: Evaluates character-level matches between predictions and references, useful for morphologically rich languages.
+Results
+Training Curves
+The training and validation loss curves for both models are plotted to monitor the convergence.
+BLEU and CHRF Scores
+The models were evaluated using at least 1000 data points for sentence-level BLEU and CHRF scores.
+The scores are saved into CSV files:
+bleu_scores_lstm.csv
+bleu_scores_seq2seq.csv
+chrf_scores_lstm.csv
+chrf_scores_seq2seq.csv
+Sample Results:
+Model	Average BLEU Score	Average CHRF Score
+LSTM-based	0.45	0.67
+Seq2Seq	0.52	0.70
+Installation and Setup
+Prerequisites
+Make sure you have the following installed:
+Python 3.x
+Google Colab or Jupyter Notebook
+Libraries: torch, transformers, evaluate, pandas, matplotlib
+Installation
+To install the required packages, run:
+bash
+Copy code
+pip install torch transformers evaluate matplotlib pandas
+How to Run
+Clone the Repository:
+bash
+Copy code
+git clone <repository-link>
+cd <repository-folder>
+Upload Data: Ensure the Assamese.json and alpaca_cleaned.json files are in the appropriate directory.
+Run the Notebooks:
+Use the provided code in Google Colab or Jupyter Notebook.
+For LSTM-based model: lstm_model.ipynb
+For Seq2Seq model: seq2seq_model.ipynb
+Generate BLEU and CHRF Scores:
+The script will generate predictions and save the scores in CSV files.
+File Structure
+Copy code
+project-root/
+├── Assamese.json
+├── alpaca_cleaned.json
+├── lstm_model.ipynb
+├── seq2seq_model.ipynb
+├── bleu_scores_lstm.csv
+├── bleu_scores_seq2seq.csv
+├── chrf_scores_lstm.csv
+├── chrf_scores_seq2seq.csv
+├── README.md
+Future Enhancements
+Implement attention mechanisms to improve translation quality.
+Experiment with transformer models for better performance.
+Optimize the models for faster inference using techniques like quantization.
+Acknowledgments
+Hugging Face for providing easy-to-use NLP evaluation metrics.
+University of New Haven for guidance and support throughout the project.
+The creators of the datasets used for training and evaluation.}