PulkundwarP commited on
Commit
7d55a31
·
verified ·
1 Parent(s): 6cce8c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -1
README.md CHANGED
@@ -1 +1,91 @@
1
- # This is a pre-trained version of Leap - A small language model based on GPT2 and Llama architecture. It is trained on TinyStories, and evaluated on GPTEval. More updates soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LangPWT
2
+
3
+ This repository contains the implementation of a lightweight, modified version of the GPT architecture **LangPWT** trained from scratch using FineWeb-Edu, an open-source dataset. The project demonstrates the design, training, and optimization of a custom natural language model on local hardware.
4
+
5
+ ## Features
6
+ - **Custom GPT Architecture**: A miniaturized version of the GPT model tailored for efficient training on limited hardware.
7
+ - **Local Training**: Complete model training executed on local resources, enabling cost-effective development.
8
+ - **Open-Source Datasets**: Trained using publicly available FineWeb-Edu dataset to ensure accessibility and reproducibility.
9
+ - **Scalable Design**: Architecture optimized for experimentation and scalability while maintaining resource efficiency.
10
+
11
+ <div align="center">
12
+ <img src="LLM.drawio.png" alt="Description of the image" width="300">
13
+ <p><strong>Figure 1: Architecture of LangPWT</p>
14
+ </div>
15
+
16
+ ## Implementation Details
17
+ 1. **Model Architecture**
18
+ - A streamlined GPT-based architecture designed for reduced complexity and improved training efficiency.
19
+ - Incorporates modifications to parameter scaling to suit resource-constrained environments.
20
+
21
+ 2. **Training**
22
+ - Training executed locally on NVIDIA GeForce RTX 3050 (Laptop) 4GB GPU, leveraging PyTorch.
23
+
24
+ 3. **Testing**
25
+ - A simple Streamlit UI created for testing generation capability of the model.
26
+
27
+ ## Model Architecture
28
+
29
+ ### Configuration
30
+ - **Sequence Length:** 512 tokens
31
+ - **Vocabulary Size:** 48,951 tokens
32
+ - Includes 50,000 BPE merges, 256 special byte tokens, and 1 `<|endoftext|>` token.
33
+ - **Number of Layers:** 4 transformer blocks
34
+ - **Attention Heads:** 8 per block
35
+ - **Embedding Dimension:** 512
36
+ - **Dropout:** 0.1
37
+
38
+ ### Components
39
+ 1. **Embeddings:**
40
+ - **Word Embeddings (`wte`):** Learnable token embeddings of size `n_embd`.
41
+ - **Position Embeddings (`wpe`):** Learnable positional embeddings for sequences up to `block_size`.
42
+
43
+ 2. **Transformer Blocks:**
44
+ - A stack of 4 transformer blocks, each comprising:
45
+ - Multi-head self-attention mechanisms.
46
+ - Feedforward networks for feature transformation.
47
+
48
+ 3. **Output Head:**
49
+ - **Linear Layer (`lm_head`):** Maps hidden states to logits for token predictions.
50
+ - Implements weight sharing between token embeddings (`wte`) and output projection for parameter efficiency.
51
+
52
+ 4. **Layer Normalization:**
53
+ - Final layer normalization (`ln_f`) ensures stable optimization.
54
+
55
+
56
+ ## Current Status:
57
+ 1. Dataset Used: FineWeb-Edu (18.5 GB) entirely.
58
+ 2. Training Steps: 5000
59
+ 3. Time Taken: ~ 7 hours
60
+ 4. File format: .pt
61
+
62
+ ## Requirements
63
+ - Python 3.8+
64
+ - PyTorch 2.0+ or TensorFlow 2.10+
65
+ - CUDA-enabled GPU with at least 4GB VRAM (recommended)
66
+ - Dependencies listed in `requirements.txt`
67
+ - **Note**: Different OS support different versions of PyTorch/Tensorflow to use CUDA (local GPU). Install only after verifying for your OS.
68
+
69
+ ## Usage
70
+ 1. Clone the repository:
71
+ ```bash
72
+ git clone https://github.com/pulkundwar29/LangPWT
73
+ cd LangPWT
74
+ ```
75
+ 2. Create and activate a virtual environment:
76
+ ```bash
77
+ venv env
78
+ env\scripts\activate
79
+ ```
80
+ 3. Install dependencies:
81
+ ```bash
82
+ pip install -r requirements.txt
83
+ ```
84
+ 4. Run the training file **trainpwt.py**
85
+ 5. Run the streamlit file: **trial_pwt.py**
86
+ 6. Enter your prompt and hit the Generate button.
87
+
88
+ <div align="center">
89
+ <img src="ex1.png" alt="example text">
90
+ <p><strong>Figure 2: Example of Text Generated using LangPWT</p>
91
+ </div>