--- library_name: transformers license: mit datasets: - IRIISNEPAL/Nepali-Text-Corpus language: - ne --- # Nepali GPT Model ## Overview The `NepaliGPTModel` is a custom GPT-style pretrained transformer model designed for natural language processing tasks, with a focus on the Nepali language. It is built using PyTorch and made compatible with the Hugging Face `transformers` library for easy integration and deployment. This model is intended for text generation and other language modeling tasks, particularly for Nepali text. This repository contains the model weights, tokenizer, and necessary files to load and use the model on Hugging Face. --- ## Model Details ### Architecture - **Model Type**: `nep_gptv1` - **Vocabulary Size**: 51,728 - **Embedding Dimension**: 768 - **Context Length**: 1,024 - **Number of Attention Heads**: 12 - **Number of Layers**: 9 - **Dropout Rate**: 0.1 - **QKV Bias**: `false` (attention layers do not use bias) - **Torch Data Type**: `float32` - **Transformers Version**: 4.51.0.dev0 The model follows a decoder-only transformer architecture, similar to GPT, with 9 transformer blocks, each containing multi-head attention and feedforward layers. It uses causal masking to ensure autoregressive generation. --- ## Training Details ### Training Configuration The model was trained with the following hyperparameters: - **Batch Size**: 1 - **Learning Rate**: 3.00e-4 - **Weight Decay**: 0.01 - **Betas (for AdamW Optimizer)**: (0.9, 0.999) - **Number of Workers**: 8 - **Maximum Epochs**: 1 - **Warmup Rate**: 0.15 (fraction of steps for learning rate warmup) - **Initial Learning Rate**: 2.94e-4 - **Minimum Learning Rate**: 2.92e-4 - **Maximum Gradient Norm**: 1.0 (for gradient clipping) - **Evaluation Frequency**: Every 8,000 steps - **No Improvement Loss Count**: 5 (early stopping criterion) - **Start Context**: `"रोगविज्ञान रोग वा चोटको कारण तथा प्रभावहरूको अध्ययन गर्ने"` (used for generation during training) ### Training and Evaluation Loss The plot below shows the training and evaluation loss over the course of training: ![Training vs Evaluation Loss](training_vs_eval_loss.png) - **Training Loss (Blue Line)**: Measured per epoch, showing a steady decrease from around 11 to below 4 over 65,000 global steps. - **Evaluation Loss (Orange Line with 'x' Markers)**: Measured per evaluation step (every 8,000 steps), starting at around 5 and decreasing to just below 4. The plot indicates that the model is learning effectively, with both training and evaluation losses converging, suggesting good generalization and minimal overfitting. --- ## Installation To use this model, you’ll need to install the required dependencies: ```bash pip install torch transformers ``` --- ## Usage ### Loading the Model and Tokenizer To load and use the model, you need to register the custom model class by importing the `model.py` file included in this repository. Then, you can use the `transformers` library to load the model and tokenizer. ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load the model and tokenizer model_name = "dinesh-bk/nepali-gpt" # Replace with your Hugging Face repo name model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # Generate text input_text = "नमस्ते, संसार!" # "Hello, world!" in Nepali inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs, max_length=50) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print("Generated text:", generated_text) ``` ### Notes - **Generation Options**: You can customize generation with parameters like `do_sample=True`, `top_k=50`, or `temperature=0.7` for more diverse outputs. --- ## Files in This Repository - `config.json`: Model configuration. - `model.safetensors`: Model weights in SafeTensors format. - `configuration_nepaligpt.py`: Defines the `NepaliGPTConfig` configuration of the model. - `model_nepaligpt.py`: Defines the `TransformerBlock`, and `NepaliGPTModel` classes, including registration. - `special_tokens_map.json`: Special tokens for the tokenizer. - `tokenizer_config.json`: Tokenizer settings. - `tokenizer.json`: Core tokenizer configuration and vocabulary. --- ## Training Data The model was trained on a dataset of Nepali text from hugging face(`IRIISNEPAL/Nepali-Text-Corpus`). --- ## Limitations - **Training Duration**: The model was trained for only 1 epoch, which might limit its performance. Further training could improve results. - **Overfitting**: While the training and evaluation losses are close, the small gap suggests potential for slight overfitting, especially with a single epoch. - **Language Specificity**: The model is tailored for Nepali but may not generalize well to other languages without fine-tuning. --- ## License This project is licensed under the MIT License. See the LICENSE file for details. --- ## Contact For questions or contributions, please contact [![MLProdigy](https://img.shields.io/badge/GitHub-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/dinesh-bk) ---