Thishyaketh commited on
Commit
d312d94
·
verified ·
1 Parent(s): 3f9d1c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -7
README.md CHANGED
@@ -1,10 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: README
3
- emoji: 🔥
4
- colorFrom: yellow
5
- colorTo: gray
6
- sdk: gradio
7
- pinned: false
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # NGen3: Next-Generation Foundational Model
2
+
3
+ NGen3 is a production-level foundational language model inspired by state-of-the-art architectures such as GPT-4, Claude-3, and Llama 2. It is designed to be highly modular, efficient, and accessible via a flexible command-line interface (CLI). NGen3 supports multiple model variants—from 7M parameters to 1B parameters—and offers a comprehensive suite of tools for:
4
+
5
+ - **Tokenization:** Process text from local files, URLs, or Hugging Face datasets.
6
+ - **Training:** Train the model on tokenized data.
7
+ - **Sampling:** Generate text from trained models.
8
+ - **Exporting:** Save models and minimal tokenizer configurations in formats compatible with Hugging Face.
9
+ - **Knowledge Distillation:** Train a smaller student model using a larger teacher model.
10
+ - **Fine-Tuning:** Adapt a distilled model on conversational data (from local sources or directly from Hugging Face).
11
+
12
+ This repository provides a complete implementation of the NGen3 model along with detailed CLI commands to facilitate experimentation and research.
13
+
14
+ ---
15
+
16
+ ## Table of Contents
17
+
18
+ - [Model Overview](#model-overview)
19
+ - [Architecture](#architecture)
20
+ - [Installation](#installation)
21
+ - [Usage](#usage)
22
+ - [Tokenization](#tokenization)
23
+ - [Training](#training)
24
+ - [Sampling](#sampling)
25
+ - [Exporting](#exporting)
26
+ - [Knowledge Distillation](#knowledge-distillation)
27
+ - [Fine-Tuning](#fine-tuning)
28
+ - [Local Fine-Tuning](#local-fine-tuning)
29
+ - [Hugging Face Fine-Tuning](#hugging-face-fine-tuning)
30
+ - [Hyperparameters](#hyperparameters)
31
+ - [Contributing](#contributing)
32
+ - [License](#license)
33
+ - [Acknowledgements](#acknowledgements)
34
+
35
+ ---
36
+
37
+ ## Model Overview
38
+
39
+ NGen3 is designed for rapid development and deployment of foundational language models. Its flexible CLI allows users to:
40
+
41
+ - **Tokenize Text:** Convert raw text or datasets into tokenized binary format.
42
+ - **Train Models:** Use various hyperparameter configurations based on the desired model size.
43
+ - **Generate Samples:** Evaluate model performance and generate text samples.
44
+ - **Export Models:** Easily export models in `safetensors` and JSON configurations for integration with Hugging Face tools.
45
+ - **Distill Models:** Leverage knowledge distillation to compress larger models into efficient student variants.
46
+ - **Fine-Tune on Conversations:** Adapt models to conversational data using both local and Hugging Face datasets.
47
+
48
  ---
49
+
50
+ ## Architecture
51
+
52
+ NGen3’s architecture is built upon the transformer decoder design. Key components include:
53
+
54
+ - **Token and Positional Embeddings:** Learnable embeddings that encode input tokens and their positions.
55
+ - **Stack of Transformer Blocks:** Each block contains:
56
+ - **Causal Self-Attention:** With multi-head attention and masking to prevent information leakage.
57
+ - **MLP (Feed-Forward Network):** Utilizes GELU activation for non-linearity.
58
+ - **Residual Connections and Layer Normalization:** Stabilize training and improve convergence.
59
+ - **Final Projection Layer:** Maps embeddings to logits over the vocabulary.
60
+
61
+ The model supports variants with parameter counts ranging from 7M to 1B, making it adaptable for various research and production needs.
62
+
63
  ---
64
 
65
+ ## Installation
66
+
67
+ Ensure you have Python 3.8+ installed along with the following packages:
68
+
69
+ - PyTorch
70
+ - transformers
71
+ - datasets
72
+ - tqdm
73
+ - safetensors (for export functionality)
74
+
75
+ Install the required packages using pip:
76
+
77
+ ```bash
78
+ pip install torch transformers datasets tqdm safetensors