Thishyaketh commited on
Commit
f7f51b8
·
verified ·
1 Parent(s): ef2525a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -5
README.md CHANGED
@@ -1,5 +1,146 @@
1
- ---
2
- license: other
3
- license_name: ngen2-community-license
4
- license_link: https://tnsaai-builds.framer.website/community/licenses/ngen2
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: ngen2-community-license
4
+ license_link: https://tnsaai-builds.framer.website/community/licenses/ngen2
5
+ language:
6
+ - en
7
+ - hi
8
+ - te
9
+ metrics:
10
+ - bleu
11
+ - perplexity
12
+ - accuracy
13
+ base_model:
14
+ - TNSA/NGen2-15M
15
+ pipeline_tag: text-generation
16
+ library_name: transformers
17
+ model_type: safetensors
18
+ new_version: TNSA/NGen3-15M
19
+ ---
20
+ # NGen 2
21
+
22
+ While using with transformers you can only use the 15M variant for now.
23
+
24
+ NGen 2 is an advanced Transformer model training pipeline that supports multiple model variants. It ranges from a **nano** variant (approximately 120M parameters) to a **foundational** variant (approximately 1B parameters). The pipeline incorporates modern architectural improvements such as rotary positional embeddings, RMSNorm, and GEGLU activations to boost performance and training efficiency.
25
+
26
+ > **Note:** Although NGen 3 is designed to train a 1B-parameter model, its advanced architecture pushes its performance closer to that of much larger models.
27
+
28
+
29
+
30
+
31
+ ## Model Variants
32
+
33
+ NGen 2 supports the following variants via the `--variant` flag:
34
+
35
+ - **nano**: ~120M parameters
36
+ - **small**: ~300M parameters
37
+ - **medium**: ~500M parameters
38
+ - **large**: ~700M parameters
39
+ - **foundational**: ~1B parameters
40
+
41
+ Each variant adjusts key hyperparameters such as the number of layers, model dimension (`d_model`), number of attention heads (`n_heads`), and the feed-forward dimension (`d_ff`).
42
+
43
+ ## Requirements
44
+
45
+ - Python 3.8+
46
+ - PyTorch
47
+ - Transformers
48
+ - Datasets
49
+ - DeepSpeed (optional, for efficient training)
50
+ - Azure ML SDK (for distributed training on Azure)
51
+
52
+ Install dependencies using pip (adjust as needed):
53
+
54
+ ```bash
55
+ pip install torch transformers datasets deepspeed azureml-core
56
+ ```
57
+
58
+ # Usage
59
+ # 1. Data Preparation
60
+ First, download and preprocess the OpenWebText dataset:
61
+
62
+ ```bash
63
+ python prepare.py --output_dir ./_data_ --max_length 4096
64
+ ```
65
+
66
+ This script downloads, tokenizes, and saves the dataset in Arrow format to the ./_data_ directory.
67
+
68
+ # 2. Local Training
69
+
70
+ The main training script is train.py. It loads the processed dataset (by default from ./_data_), instantiates the desired model variant, and starts training.
71
+
72
+ Example CLI Commands
73
+
74
+ - Train the nano (120M) variant:
75
+
76
+ ```bash
77
+ python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_nano --batch_size 4 --epochs 3 --variant nano
78
+ ```
79
+
80
+ - Train the small (300M) variant:
81
+
82
+ ```bash
83
+ python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_small --batch_size 4 --epochs 3 --variant small
84
+ ```
85
+
86
+ - Train the medium (500M) variant:
87
+
88
+ ```bash
89
+ python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_medium --batch_size 4 --epochs 3 --variant medium
90
+ ```
91
+
92
+ - Train the large (700M) variant:
93
+ ```bash
94
+ python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_large --batch_size 4 --epochs 3 --variant large
95
+ ```
96
+
97
+ - Train the foundational (1B) variant with rotary embeddings enabled:
98
+ ```bash
99
+ python train.py --dataset_dir ./_data_ --output_dir ./checkpoints_foundational --batch_size 4 --epochs 3 --variant foundational --use_rotary
100
+ ```
101
+
102
+ # 3. Training on Azure ML
103
+
104
+ - Step 1: Set Up Azure ML Resources
105
+
106
+ Use ```azure_setup.py``` to create or connect to your Azure ML workspace and set up a compute cluster:
107
+
108
+ ```bash
109
+ python azure_setup.py \
110
+ --workspace_name MyWorkspace \
111
+ --resource_group MyResourceGroup \
112
+ --subscription_id YOUR_SUBSCRIPTION_ID \
113
+ --location eastus \
114
+ --compute_name gpu-cluster \
115
+ --vm_size Standard_NC6 \
116
+ --max_nodes 4 \
117
+ --min_nodes 0
118
+ ```
119
+ - Step 2: Submit a Training Job to Azure ML
120
+ Use ```submit_train.py``` to submit your training script to Azure ML:
121
+ ```bash
122
+ python submit_train.py \
123
+ --experiment_name ngen3-experiment \
124
+ --compute_target gpu-cluster \
125
+ --script train.py \
126
+ --dataset_dir ./_data_ \
127
+ --output_dir ./checkpoints_foundational \
128
+ --batch_size 4 \
129
+ --epochs 3 \
130
+ --variant foundational \
131
+ --use_rotary
132
+ ```
133
+
134
+ # 4. DeepSpeed Integration
135
+
136
+ The deepspeed.json file configures mixed-precision training and ZeRO optimizations. To leverage DeepSpeed, ensure it is installed and adjust your training script or submission command to enable DeepSpeed support.
137
+
138
+ # License
139
+ License
140
+ The NGen 2 project is developed and maintained by TNSA AI. The licensing model is dual:
141
+
142
+ - The nano and small variants are open source and released under the MIT License.
143
+ - The medium, large, and foundational variants are proprietary and are not open source. Use of these proprietary components is subject to TNSA AI's proprietary licensing terms.
144
+
145
+ # Copyright
146
+ © 2023 TNSA AI. All rights reserved. for Use read ```LICENCE``` in the LICENSE file