File size: 4,574 Bytes

a331aaa

---
license: other
license_name: ngen2-community-license
license_link: https://tnsaai-builds.framer.website/community/licenses/ngen2
language:
- en
- hi
- te
metrics:
- bleu
- perplexity
- accuracy
base_model:
- TNSA/NGen2-15M
pipeline_tag: text-generation
library_name: transformers
model_type: safetensors
new_version: TNSA/NGen3-15M
---
# NGen 2

While using with transformers you can only use the 15M variant for now.

NGen 2 is an advanced Transformer model training pipeline that supports multiple model variants. It ranges from a **nano** variant (approximately 120M parameters) to a **foundational** variant (approximately 1B parameters). The pipeline incorporates modern architectural improvements such as rotary positional embeddings, RMSNorm, and GEGLU activations to boost performance and training efficiency.

> **Note:** Although NGen 3 is designed to train a 1B-parameter model, its advanced architecture pushes its performance closer to that of much larger models.




## Model Variants

NGen 2 supports the following variants via the `--variant` flag:

- **nano**: ~120M parameters  
- **small**: ~300M parameters  
- **medium**: ~500M parameters  
- **large**: ~700M parameters  
- **foundational**: ~1B parameters  

Each variant adjusts key hyperparameters such as the number of layers, model dimension (`d_model`), number of attention heads (`n_heads`), and the feed-forward dimension (`d_ff`).

## Requirements

- Python 3.8+
- PyTorch
- Transformers
- Datasets
- DeepSpeed (optional, for efficient training)
- Azure ML SDK (for distributed training on Azure)

Install dependencies using pip (adjust as needed):

```bash
pip install torch transformers datasets deepspeed azureml-core
```

# Usage
  # 1. Data Preparation
First, download and preprocess the OpenWebText dataset:

```bash 
python prepare.py --output_dir ./_data_ --max_length 4096
```

This script downloads, tokenizes, and saves the dataset in Arrow format to the ./_data_ directory.

# 2. Local Training

The main training script is train.py. It loads the processed dataset (by default from ./_data_), instantiates the desired model variant, and starts training.

Example CLI Commands

- Train the nano (120M) variant:

```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_nano --batch_size 4 --epochs 3 --variant nano
```

- Train the small (300M) variant:

```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_small --batch_size 4 --epochs 3 --variant small
```

- Train the medium (500M) variant:

```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_medium --batch_size 4 --epochs 3 --variant medium
```

- Train the large (700M) variant:
```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_large --batch_size 4 --epochs 3 --variant large
```

- Train the foundational (1B) variant with rotary embeddings enabled:
```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_foundational --batch_size 4 --epochs 3 --variant foundational --use_rotary
```

# 3. Training on Azure ML

- Step 1: Set Up Azure ML Resources

Use ```azure_setup.py``` to create or connect to your Azure ML workspace and set up a compute cluster:

```bash
python azure_setup.py \
  --workspace_name MyWorkspace \
  --resource_group MyResourceGroup \
  --subscription_id YOUR_SUBSCRIPTION_ID \
  --location eastus \
  --compute_name gpu-cluster \
  --vm_size Standard_NC6 \
  --max_nodes 4 \
  --min_nodes 0
```
- Step 2: Submit a Training Job to Azure ML
Use ```submit_train.py``` to submit your training script to Azure ML:
```bash
python submit_train.py \
  --experiment_name ngen3-experiment \
  --compute_target gpu-cluster \
  --script train.py \
  --dataset_dir ./_data_ \
  --output_dir ./checkpoints_foundational \
  --batch_size 4 \
  --epochs 3 \
  --variant foundational \
  --use_rotary
```

# 4. DeepSpeed Integration

The deepspeed.json file configures mixed-precision training and ZeRO optimizations. To leverage DeepSpeed, ensure it is installed and adjust your training script or submission command to enable DeepSpeed support.

# License
License
The NGen 2 project is developed and maintained by TNSA AI. The licensing model is dual:

- The nano and small variants are open source and released under the MIT License.
- The medium, large, and foundational variants are proprietary and are not open source. Use of these proprietary components is subject to TNSA AI's proprietary licensing terms.

# Copyright
© 2023 TNSA AI. All rights reserved. for Use read ```LICENCE``` in the LICENSE file