Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# NGen3: Next-Generation Foundational Model
|
| 2 |
+
|
| 3 |
+
NGen3 is a production-level foundational language model inspired by state-of-the-art architectures such as GPT-4, Claude-3, and Llama 2. It is designed to be highly modular, efficient, and accessible via a flexible command-line interface (CLI). NGen3 supports multiple model variants—from 7M parameters to 1B parameters—and offers a comprehensive suite of tools for:
|
| 4 |
+
|
| 5 |
+
- **Tokenization:** Process text from local files, URLs, or Hugging Face datasets.
|
| 6 |
+
- **Training:** Train the model on tokenized data.
|
| 7 |
+
- **Sampling:** Generate text from trained models.
|
| 8 |
+
- **Exporting:** Save models and minimal tokenizer configurations in formats compatible with Hugging Face.
|
| 9 |
+
- **Knowledge Distillation:** Train a smaller student model using a larger teacher model.
|
| 10 |
+
- **Fine-Tuning:** Adapt a distilled model on conversational data (from local sources or directly from Hugging Face).
|
| 11 |
+
|
| 12 |
+
This repository provides a complete implementation of the NGen3 model along with detailed CLI commands to facilitate experimentation and research.
|
| 13 |
+
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
## Table of Contents
|
| 17 |
+
|
| 18 |
+
- [Model Overview](#model-overview)
|
| 19 |
+
- [Architecture](#architecture)
|
| 20 |
+
- [Installation](#installation)
|
| 21 |
+
- [Usage](#usage)
|
| 22 |
+
- [Tokenization](#tokenization)
|
| 23 |
+
- [Training](#training)
|
| 24 |
+
- [Sampling](#sampling)
|
| 25 |
+
- [Exporting](#exporting)
|
| 26 |
+
- [Knowledge Distillation](#knowledge-distillation)
|
| 27 |
+
- [Fine-Tuning](#fine-tuning)
|
| 28 |
+
- [Local Fine-Tuning](#local-fine-tuning)
|
| 29 |
+
- [Hugging Face Fine-Tuning](#hugging-face-fine-tuning)
|
| 30 |
+
- [Hyperparameters](#hyperparameters)
|
| 31 |
+
- [Contributing](#contributing)
|
| 32 |
+
- [License](#license)
|
| 33 |
+
- [Acknowledgements](#acknowledgements)
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## Model Overview
|
| 38 |
+
|
| 39 |
+
NGen3 is designed for rapid development and deployment of foundational language models. Its flexible CLI allows users to:
|
| 40 |
+
|
| 41 |
+
- **Tokenize Text:** Convert raw text or datasets into tokenized binary format.
|
| 42 |
+
- **Train Models:** Use various hyperparameter configurations based on the desired model size.
|
| 43 |
+
- **Generate Samples:** Evaluate model performance and generate text samples.
|
| 44 |
+
- **Export Models:** Easily export models in `safetensors` and JSON configurations for integration with Hugging Face tools.
|
| 45 |
+
- **Distill Models:** Leverage knowledge distillation to compress larger models into efficient student variants.
|
| 46 |
+
- **Fine-Tune on Conversations:** Adapt models to conversational data using both local and Hugging Face datasets.
|
| 47 |
+
|
| 48 |
---
|
| 49 |
+
|
| 50 |
+
## Architecture
|
| 51 |
+
|
| 52 |
+
NGen3’s architecture is built upon the transformer decoder design. Key components include:
|
| 53 |
+
|
| 54 |
+
- **Token and Positional Embeddings:** Learnable embeddings that encode input tokens and their positions.
|
| 55 |
+
- **Stack of Transformer Blocks:** Each block contains:
|
| 56 |
+
- **Causal Self-Attention:** With multi-head attention and masking to prevent information leakage.
|
| 57 |
+
- **MLP (Feed-Forward Network):** Utilizes GELU activation for non-linearity.
|
| 58 |
+
- **Residual Connections and Layer Normalization:** Stabilize training and improve convergence.
|
| 59 |
+
- **Final Projection Layer:** Maps embeddings to logits over the vocabulary.
|
| 60 |
+
|
| 61 |
+
The model supports variants with parameter counts ranging from 7M to 1B, making it adaptable for various research and production needs.
|
| 62 |
+
|
| 63 |
---
|
| 64 |
|
| 65 |
+
## Installation
|
| 66 |
+
|
| 67 |
+
Ensure you have Python 3.8+ installed along with the following packages:
|
| 68 |
+
|
| 69 |
+
- PyTorch
|
| 70 |
+
- transformers
|
| 71 |
+
- datasets
|
| 72 |
+
- tqdm
|
| 73 |
+
- safetensors (for export functionality)
|
| 74 |
+
|
| 75 |
+
Install the required packages using pip:
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
pip install torch transformers datasets tqdm safetensors
|