tejagowda commited on
Commit
56bebd8
·
verified ·
1 Parent(s): fb50754

Create README.md

Browse files

Bilingual Language Model for Next Token Prediction

Overview:
This project focuses on building a neural network-based language model for next token prediction using two languages: English and Hebrew. The model is implemented using an LSTM (Long Short-Term Memory) architecture, designed to predict the next word in a sequence based on the training data provided. The project leverages Recurrent Neural Networks (RNNs) and evaluates the model using the perplexity metric to measure the quality of the predictions.
The final model and checkpoints are provided, along with training history including perplexity and loss values.

Model Architecture:
- Embedding Layer**: Converts tokenized words into dense vector representations.
- LSTM Layer**: Consists of 128 units to capture long-term dependencies in the sequence data.
- Dense Output Layer**: Outputs a probability distribution over the vocabulary to predict the next word.
- Total Vocabulary Size**: The model is trained on a corpus of size `[total_words]` (combining both English and Hebrew datasets).

Datasets:
The model is trained using a combination of English and Hebrew text datasets. The input sequences are tokenized and padded to ensure consistent input length for training the model.

Training:
The model was trained with the following parameters:
- Optimizer: Adam(Adaptive Momentum)
- Loss Function: Categorical Cross entropy
- Batch Size: 64
- Epochs: 20
- Validation Split: 20%

Evaluation Metric: Perplexity:
Perplexity is used to measure the model's performance, with lower perplexity indicating better generalization to unseen data. The final perplexity scores are:
- Final Training Perplexity
- Final Validation Perplexity

Checkpoints:
A checkpoint mechanism is used to save the model at its best-performing stage based on validation loss. The best model checkpoint (`best_model.keras`) is included, which can be loaded for inference.

Results
The model demonstrates competitive performance in predicting next tokens for both English and Hebrew, achieving satisfactory perplexity scores on both training and validation datasets.

How to Use
To use this model, follow these steps:

1. Clone the repository:
```bash
git clone https://huggingface.co/username/model-name
cd model-name

Files changed (1) hide show
  1. README.md +5 -0
README.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - he
5
+ ---