File size: 4,574 Bytes
a331aaa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
license: other
license_name: ngen2-community-license
license_link: https://tnsaai-builds.framer.website/community/licenses/ngen2
language:
- en
- hi
- te
metrics:
- bleu
- perplexity
- accuracy
base_model:
- TNSA/NGen2-15M
pipeline_tag: text-generation
library_name: transformers
model_type: safetensors
new_version: TNSA/NGen3-15M
---
# NGen 2
While using with transformers you can only use the 15M variant for now.
NGen 2 is an advanced Transformer model training pipeline that supports multiple model variants. It ranges from a **nano** variant (approximately 120M parameters) to a **foundational** variant (approximately 1B parameters). The pipeline incorporates modern architectural improvements such as rotary positional embeddings, RMSNorm, and GEGLU activations to boost performance and training efficiency.
> **Note:** Although NGen 3 is designed to train a 1B-parameter model, its advanced architecture pushes its performance closer to that of much larger models.
## Model Variants
NGen 2 supports the following variants via the `--variant` flag:
- **nano**: ~120M parameters
- **small**: ~300M parameters
- **medium**: ~500M parameters
- **large**: ~700M parameters
- **foundational**: ~1B parameters
Each variant adjusts key hyperparameters such as the number of layers, model dimension (`d_model`), number of attention heads (`n_heads`), and the feed-forward dimension (`d_ff`).
## Requirements
- Python 3.8+
- PyTorch
- Transformers
- Datasets
- DeepSpeed (optional, for efficient training)
- Azure ML SDK (for distributed training on Azure)
Install dependencies using pip (adjust as needed):
```bash
pip install torch transformers datasets deepspeed azureml-core
```
# Usage
# 1. Data Preparation
First, download and preprocess the OpenWebText dataset:
```bash
python prepare.py --output_dir ./_data_ --max_length 4096
```
This script downloads, tokenizes, and saves the dataset in Arrow format to the ./_data_ directory.
# 2. Local Training
The main training script is train.py. It loads the processed dataset (by default from ./_data_), instantiates the desired model variant, and starts training.
Example CLI Commands
- Train the nano (120M) variant:
```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_nano --batch_size 4 --epochs 3 --variant nano
```
- Train the small (300M) variant:
```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_small --batch_size 4 --epochs 3 --variant small
```
- Train the medium (500M) variant:
```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_medium --batch_size 4 --epochs 3 --variant medium
```
- Train the large (700M) variant:
```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_large --batch_size 4 --epochs 3 --variant large
```
- Train the foundational (1B) variant with rotary embeddings enabled:
```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_foundational --batch_size 4 --epochs 3 --variant foundational --use_rotary
```
# 3. Training on Azure ML
- Step 1: Set Up Azure ML Resources
Use ```azure_setup.py``` to create or connect to your Azure ML workspace and set up a compute cluster:
```bash
python azure_setup.py \
--workspace_name MyWorkspace \
--resource_group MyResourceGroup \
--subscription_id YOUR_SUBSCRIPTION_ID \
--location eastus \
--compute_name gpu-cluster \
--vm_size Standard_NC6 \
--max_nodes 4 \
--min_nodes 0
```
- Step 2: Submit a Training Job to Azure ML
Use ```submit_train.py``` to submit your training script to Azure ML:
```bash
python submit_train.py \
--experiment_name ngen3-experiment \
--compute_target gpu-cluster \
--script train.py \
--dataset_dir ./_data_ \
--output_dir ./checkpoints_foundational \
--batch_size 4 \
--epochs 3 \
--variant foundational \
--use_rotary
```
# 4. DeepSpeed Integration
The deepspeed.json file configures mixed-precision training and ZeRO optimizations. To leverage DeepSpeed, ensure it is installed and adjust your training script or submission command to enable DeepSpeed support.
# License
License
The NGen 2 project is developed and maintained by TNSA AI. The licensing model is dual:
- The nano and small variants are open source and released under the MIT License.
- The medium, large, and foundational variants are proprietary and are not open source. Use of these proprietary components is subject to TNSA AI's proprietary licensing terms.
# Copyright
© 2023 TNSA AI. All rights reserved. for Use read ```LICENCE``` in the LICENSE file |