tejagowda
/

NextTokenPrediction

Model card Files Files and versions

NextTokenPrediction / README.md

tejagowda's picture

Update README.md

3413b8d verified over 1 year ago

|

history blame contribute delete

2.41 kB

	---
	language:
	- en
	- he
	---
	# Bilingual Language Model for Next Token Prediction

	## Overview
	This project focuses on building a neural network-based language model for next token prediction using two languages: English and Hebrew. The model is implemented using an LSTM (Long Short-Term Memory) architecture, designed to predict the next word in a sequence based on the training data provided. The project leverages Recurrent Neural Networks (RNNs) and evaluates the model using the perplexity metric to measure the quality of the predictions.

	The final model and checkpoints are provided, along with training history including perplexity and loss values.

	## Model Architecture
	- Embedding Layer: Converts tokenized words into dense vector representations.
	- LSTM Layer: Consists of 128 units to capture long-term dependencies in the sequence data.
	- Dense Output Layer: Outputs a probability distribution over the vocabulary to predict the next word.
	- Total Vocabulary Size: The model is trained on a corpus of size `[total_words]` (combining both English and Hebrew datasets).

	## Dataset
	The model is trained using a combination of English and Hebrew text datasets. The input sequences are tokenized and padded to ensure consistent input length for training the model.

	## Training
	The model was trained with the following parameters:
	- Optimizer: Adam
	- Loss Function: Categorical Crossentropy
	- Batch Size: 64
	- Epochs: 20
	- Validation Split: 20%

	## Evaluation Metric: Perplexity
	Perplexity is used to measure the model's performance, with lower perplexity indicating better generalization to unseen data. The final perplexity scores are:
	- Final Training Perplexity: `[Final Training Perplexity]`
	- Final Validation Perplexity: `[Final Validation Perplexity]`

	## Checkpoints
	A checkpoint mechanism is used to save the model at its best-performing stage based on validation loss. The best model checkpoint (`best_model.keras`) is included, which can be loaded for inference.

	## Results
	The model demonstrates competitive performance in predicting next tokens for both English and Hebrew, achieving satisfactory perplexity scores on both training and validation datasets.

	## How to Use
	To use this model, follow these steps:

	1. Clone the repository:
	```bash
	git clone https://huggingface.co/username/model-name
	cd model-name