File size: 4,574 Bytes
a331aaa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
license: other
license_name: ngen2-community-license
license_link: https://tnsaai-builds.framer.website/community/licenses/ngen2
language:
- en
- hi
- te
metrics:
- bleu
- perplexity
- accuracy
base_model:
- TNSA/NGen2-15M
pipeline_tag: text-generation
library_name: transformers
model_type: safetensors
new_version: TNSA/NGen3-15M
---
# NGen 2

While using with transformers you can only use the 15M variant for now.

NGen 2 is an advanced Transformer model training pipeline that supports multiple model variants. It ranges from a **nano** variant (approximately 120M parameters) to a **foundational** variant (approximately 1B parameters). The pipeline incorporates modern architectural improvements such as rotary positional embeddings, RMSNorm, and GEGLU activations to boost performance and training efficiency.

> **Note:** Although NGen 3 is designed to train a 1B-parameter model, its advanced architecture pushes its performance closer to that of much larger models.




## Model Variants

NGen 2 supports the following variants via the `--variant` flag:

- **nano**: ~120M parameters  
- **small**: ~300M parameters  
- **medium**: ~500M parameters  
- **large**: ~700M parameters  
- **foundational**: ~1B parameters  

Each variant adjusts key hyperparameters such as the number of layers, model dimension (`d_model`), number of attention heads (`n_heads`), and the feed-forward dimension (`d_ff`).

## Requirements

- Python 3.8+
- PyTorch
- Transformers
- Datasets
- DeepSpeed (optional, for efficient training)
- Azure ML SDK (for distributed training on Azure)

Install dependencies using pip (adjust as needed):

```bash
pip install torch transformers datasets deepspeed azureml-core
```

# Usage
  # 1. Data Preparation
First, download and preprocess the OpenWebText dataset:

```bash 
python prepare.py --output_dir ./_data_ --max_length 4096
```

This script downloads, tokenizes, and saves the dataset in Arrow format to the ./_data_ directory.

# 2. Local Training

The main training script is train.py. It loads the processed dataset (by default from ./_data_), instantiates the desired model variant, and starts training.

Example CLI Commands

- Train the nano (120M) variant:

```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_nano --batch_size 4 --epochs 3 --variant nano
```

- Train the small (300M) variant:

```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_small --batch_size 4 --epochs 3 --variant small
```

- Train the medium (500M) variant:

```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_medium --batch_size 4 --epochs 3 --variant medium
```

- Train the large (700M) variant:
```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_large --batch_size 4 --epochs 3 --variant large
```

- Train the foundational (1B) variant with rotary embeddings enabled:
```bash
python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_foundational --batch_size 4 --epochs 3 --variant foundational --use_rotary
```

# 3. Training on Azure ML

- Step 1: Set Up Azure ML Resources

Use ```azure_setup.py``` to create or connect to your Azure ML workspace and set up a compute cluster:

```bash
python azure_setup.py \
  --workspace_name MyWorkspace \
  --resource_group MyResourceGroup \
  --subscription_id YOUR_SUBSCRIPTION_ID \
  --location eastus \
  --compute_name gpu-cluster \
  --vm_size Standard_NC6 \
  --max_nodes 4 \
  --min_nodes 0
```
- Step 2: Submit a Training Job to Azure ML
Use ```submit_train.py``` to submit your training script to Azure ML:
```bash
python submit_train.py \
  --experiment_name ngen3-experiment \
  --compute_target gpu-cluster \
  --script train.py \
  --dataset_dir ./_data_ \
  --output_dir ./checkpoints_foundational \
  --batch_size 4 \
  --epochs 3 \
  --variant foundational \
  --use_rotary
```

# 4. DeepSpeed Integration

The deepspeed.json file configures mixed-precision training and ZeRO optimizations. To leverage DeepSpeed, ensure it is installed and adjust your training script or submission command to enable DeepSpeed support.

# License
License
The NGen 2 project is developed and maintained by TNSA AI. The licensing model is dual:

- The nano and small variants are open source and released under the MIT License.
- The medium, large, and foundational variants are proprietary and are not open source. Use of these proprietary components is subject to TNSA AI's proprietary licensing terms.

# Copyright
© 2023 TNSA AI. All rights reserved. for Use read ```LICENCE``` in the LICENSE file