Translator / README.md
tejagowda's picture
Update README.md
1e1092f verified
LSTM and Seq-to-Seq Language Translator
This project implements language translation using two approaches:
LSTM-based Translator: A model that translates between English and Hebrew using a basic encoder-decoder architecture.
Seq-to-Seq Translator: A sequence-to-sequence model without attention for bidirectional translation between English and Hebrew.
Both models are trained on a parallel dataset of 1000 sentence pairs and evaluated using BLEU and CHRF scores.
Model Architectures
1. LSTM-Based Translator
The LSTM model is built with the following components:
Encoder: Embedding and LSTM layers to encode English input sequences into latent representations.
Decoder: Embedding and LSTM layers initialized with the encoder's states, generating Hebrew translations token-by-token.
Dense Layer: A fully connected output layer with a softmax activation to predict the next word in the sequence.
2. Seq-to-Seq Translator
The Seq-to-Seq model uses:
Encoder: Similar to the LSTM-based translator, this encodes the input sequence into context vectors.
Decoder: Predicts the target sequence without attention, relying entirely on the encoded context.
LSTM and Seq-to-Seq Language Translator
This project implements language translation using two approaches:
LSTM-based Translator: A model that translates between English and Hebrew using a basic encoder-decoder architecture.
Seq-to-Seq Translator: A sequence-to-sequence model without attention for bidirectional translation between English and Hebrew.
Both models are trained on a parallel dataset of 1000 sentence pairs and evaluated using BLEU and CHRF scores.
Model Architectures
1. LSTM-Based Translator
The LSTM model is built with the following components:
Encoder: Embedding and LSTM layers to encode English input sequences into latent representations.
Decoder: Embedding and LSTM layers initialized with the encoder's states, generating Hebrew translations token-by-token.
Dense Layer: A fully connected output layer with a softmax activation to predict the next word in the sequence.
2. Seq-to-Seq Translator
The Seq-to-Seq model uses:
Encoder: Similar to the LSTM-based translator, this encodes the input sequence into context vectors.
Decoder: Predicts the target sequence without attention, relying entirely on the encoded context.
Dataset
The models are trained on a custom parallel dataset containing 1000 English-Hebrew sentence pairs, formatted as JSON with fields english and hebrew. The Hebrew text includes <start> and <end> tokens for better decoding.
Preprocessing:
Tokenization: Text is tokenized using Keras' Tokenizer.
Padding: Sequences are padded to a fixed length for training.
Vocabulary Sizes:
English: 1000 pairs
Hebrew: 1000 pairs
Training Details
Training Parameters:
Optimizer: Adam
Loss Function: Sparse Categorical Crossentropy
Batch Size: 32
Epochs: 20
Validation Split: 20%
Checkpoints:
Models are saved at their best-performing stages based on validation loss using Keras' ModelCheckpoint.
Training Metrics:
Both models track:
Training Loss
Validation Loss
Evaluation Metrics
1. BLEU Score:
The BLEU metric evaluates the quality of translations by comparing them to reference translations. Higher BLEU scores indicate better translations.
LSTM Model BLEU: [BLEU Score for LSTM]
Seq-to-Seq Model BLEU: [BLEU Score for Seq-to-Seq]
2. CHRF Score:
The CHRF metric evaluates translations using character-level F-scores. Higher CHRF scores indicate better translations.
LSTM Model CHRF: [CHRF Score for LSTM]
Seq-to-Seq Model CHRF: [CHRF Score for Seq-to-Seq]
LSTM and Seq-to-Seq Language Translator
This project implements language translation using two approaches:
LSTM-based Translator: A model that translates between English and Hebrew using a basic encoder-decoder architecture.
Seq-to-Seq Translator: A sequence-to-sequence model without attention for bidirectional translation between English and Hebrew.
Both models are trained on a parallel dataset of 1000 sentence pairs and evaluated using BLEU and CHRF scores.
Model Architectures
1. LSTM-Based Translator
The LSTM model is built with the following components:
Encoder: Embedding and LSTM layers to encode English input sequences into latent representations.
Decoder: Embedding and LSTM layers initialized with the encoder's states, generating Hebrew translations token-by-token.
Dense Layer: A fully connected output layer with a softmax activation to predict the next word in the sequence.
2. Seq-to-Seq Translator
The Seq-to-Seq model uses:
Encoder: Similar to the LSTM-based translator, this encodes the input sequence into context vectors.
Decoder: Predicts the target sequence without attention, relying entirely on the encoded context.
Dataset
The models are trained on a custom parallel dataset containing 1000 English-Hebrew sentence pairs, formatted as JSON with fields english and hebrew. The Hebrew text includes <start> and <end> tokens for better decoding.
Preprocessing:
Tokenization: Text is tokenized using Keras' Tokenizer.
Padding: Sequences are padded to a fixed length for training.
Vocabulary Sizes:
English: [English Vocabulary Size]
Hebrew: [Hebrew Vocabulary Size]
Training Details
Training Parameters:
Optimizer: Adam
Loss Function: Sparse Categorical Crossentropy
Batch Size: 32
Epochs: 20
Validation Split: 20%
Checkpoints:
Models are saved at their best-performing stages based on validation loss using Keras' ModelCheckpoint.
Training Metrics:
Both models track:
Training Loss
Validation Loss
Evaluation Metrics
1. BLEU Score:
The BLEU metric evaluates the quality of translations by comparing them to reference translations. Higher BLEU scores indicate better translations.
LSTM Model BLEU: [BLEU Score for LSTM]
Seq-to-Seq Model BLEU: [BLEU Score for Seq-to-Seq]
2. CHRF Score:
The CHRF metric evaluates translations using character-level F-scores. Higher CHRF scores indicate better translations.
LSTM Model CHRF: [CHRF Score for LSTM]
Seq-to-Seq Model CHRF: [CHRF Score for Seq-to-Seq]
Results
Training Loss Comparison: The Seq-to-Seq model achieved slightly better convergence compared to the LSTM model due to its structured architecture.
Translation Quality: The BLEU and CHRF scores indicate that both models provide reasonable translations, with the Seq-to-Seq model performing better on longer sentences.
Acknowledgments
Dataset: [Custom Parallel Dataset]
Evaluation Tools: PyTorch BLEU, SacreBLEU CHRF.