TNSA
/

NGen2-170M

@@ -1,5 +1,146 @@
----
-license: other
-license_name: ngen2-community-license
-license_link: https://tnsaai-builds.framer.website/community/licenses/ngen2
----

+---
+license: other
+license_name: ngen2-community-license
+license_link: https://tnsaai-builds.framer.website/community/licenses/ngen2
+language:
+- en
+- hi
+- te
+metrics:
+- bleu
+- perplexity
+- accuracy
+base_model:
+- TNSA/NGen2-15M
+pipeline_tag: text-generation
+library_name: transformers
+model_type: safetensors
+new_version: TNSA/NGen3-15M
+---
+# NGen 2
+While using with transformers you can only use the 15M variant for now.
+NGen 2 is an advanced Transformer model training pipeline that supports multiple model variants. It ranges from a **nano** variant (approximately 120M parameters) to a **foundational** variant (approximately 1B parameters). The pipeline incorporates modern architectural improvements such as rotary positional embeddings, RMSNorm, and GEGLU activations to boost performance and training efficiency.
+> **Note:** Although NGen 3 is designed to train a 1B-parameter model, its advanced architecture pushes its performance closer to that of much larger models.
+## Model Variants
+NGen 2 supports the following variants via the `--variant` flag:
+- **nano**: ~120M parameters
+- **small**: ~300M parameters
+- **medium**: ~500M parameters
+- **large**: ~700M parameters
+- **foundational**: ~1B parameters
+Each variant adjusts key hyperparameters such as the number of layers, model dimension (`d_model`), number of attention heads (`n_heads`), and the feed-forward dimension (`d_ff`).
+## Requirements
+- Python 3.8+
+- PyTorch
+- Transformers
+- Datasets
+- DeepSpeed (optional, for efficient training)
+- Azure ML SDK (for distributed training on Azure)
+Install dependencies using pip (adjust as needed):
+```bash
+pip install torch transformers datasets deepspeed azureml-core
+```
+# Usage
+  # 1. Data Preparation
+First, download and preprocess the OpenWebText dataset:
+```bash
+python prepare.py --output_dir ./_data_ --max_length 4096
+```
+This script downloads, tokenizes, and saves the dataset in Arrow format to the ./_data_ directory.
+# 2. Local Training
+The main training script is train.py. It loads the processed dataset (by default from ./_data_), instantiates the desired model variant, and starts training.
+Example CLI Commands
+- Train the nano (120M) variant:
+```bash
+python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_nano --batch_size 4 --epochs 3 --variant nano
+```
+- Train the small (300M) variant:
+```bash
+python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_small --batch_size 4 --epochs 3 --variant small
+```
+- Train the medium (500M) variant:
+```bash
+python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_medium --batch_size 4 --epochs 3 --variant medium
+```
+- Train the large (700M) variant:
+```bash
+python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_large --batch_size 4 --epochs 3 --variant large
+```
+- Train the foundational (1B) variant with rotary embeddings enabled:
+```bash
+python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_foundational --batch_size 4 --epochs 3 --variant foundational --use_rotary
+```
+# 3. Training on Azure ML
+- Step 1: Set Up Azure ML Resources
+Use ```azure_setup.py``` to create or connect to your Azure ML workspace and set up a compute cluster:
+```bash
+python azure_setup.py \
+  --workspace_name MyWorkspace \
+  --resource_group MyResourceGroup \
+  --subscription_id YOUR_SUBSCRIPTION_ID \
+  --location eastus \
+  --compute_name gpu-cluster \
+  --vm_size Standard_NC6 \
+  --max_nodes 4 \
+  --min_nodes 0
+```
+- Step 2: Submit a Training Job to Azure ML
+Use ```submit_train.py``` to submit your training script to Azure ML:
+```bash
+python submit_train.py \
+  --experiment_name ngen3-experiment \
+  --compute_target gpu-cluster \
+  --script train.py \
+  --dataset_dir ./_data_ \
+  --output_dir ./checkpoints_foundational \
+  --batch_size 4 \
+  --epochs 3 \
+  --variant foundational \
+  --use_rotary
+```
+# 4. DeepSpeed Integration
+The deepspeed.json file configures mixed-precision training and ZeRO optimizations. To leverage DeepSpeed, ensure it is installed and adjust your training script or submission command to enable DeepSpeed support.
+# License
+License
+The NGen 2 project is developed and maintained by TNSA AI. The licensing model is dual:
+- The nano and small variants are open source and released under the MIT License.
+- The medium, large, and foundational variants are proprietary and are not open source. Use of these proprietary components is subject to TNSA AI's proprietary licensing terms.
+# Copyright
+© 2023 TNSA AI. All rights reserved. for Use read ```LICENCE``` in the LICENSE file