hemantvirmani commited on
Commit
dd70b62
·
verified ·
1 Parent(s): d30893b

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -210
README.md DELETED
@@ -1,210 +0,0 @@
1
- ---
2
- license: mit
3
- ---
4
- # TinyGPT — GPT-2 Style LM (~163M) trained on FineWeb-Edu
5
-
6
- A GPT-2 style decoder-only transformer pretrained from scratch on ~43B tokens
7
- of the FineWeb-Edu dataset, achieving a validation loss of **2.84**.
8
-
9
- Built this project to develop hands-on intuition for LLMs - inspired by Andrej Karpathy's nanoGPT
10
-
11
- ---
12
-
13
- ## Model Details
14
-
15
- | Parameter | Value |
16
- |-----------|-------|
17
- | Architecture | Decoder-only Transformer (GPT-2 style) |
18
- | Parameters | ~163M |
19
- | Layers | 12 |
20
- | Attention heads | 12 |
21
- | Embedding dim | 768 |
22
- | Context length | 1024 tokens |
23
- | Vocab size | 50,257 |
24
- | Tokenizer | GPT-2 BPE via `tiktoken` |
25
- | Attention | Causal self-attention (Flash Attention via `F.scaled_dot_product_attention`) |
26
- | LM head | Separate linear layer (not weight-tied) |
27
-
28
- > **Why ~163M and not 124M?** Standard GPT-2 124M ties the LM head weights
29
- > with the token embedding table, saving ~38M parameters. TinyGPT uses a
30
- > separate `nn.Linear` head, resulting in ~163M total parameters.
31
-
32
- ---
33
-
34
- ## Training Details
35
-
36
- | Detail | Value |
37
- |--------|-------|
38
- | Dataset | [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (`sample-100BT` subset) |
39
- | Tokens trained | ~43B |
40
- | Validation loss | 2.84 |
41
- | Optimizer | AdamW (betas=(0.9, 0.95), eps=1e-8) |
42
- | Learning rate | 6e-4 |
43
- | LR schedule | Linear warmup (4000 steps) -> Cosine decay to 6e-5 |
44
- | Effective batch size | 512 (16 x 32 gradient accumulation steps) |
45
- | Weight decay | 0.1 |
46
- | Gradient clipping | 1.0 |
47
- | Precision | bfloat16 (bf16) |
48
- | Max iterations | 600,000 |
49
- | Dropout | 0.0 |
50
-
51
- ---
52
-
53
- ## Format
54
-
55
- Weights are saved in **PyTorch native format** — a plain state dict saved with
56
- `torch.save()`, containing only model weights (no optimizer state, no
57
- scheduler). The file is ~670MB.
58
-
59
- To load, you need the `TinyGPT` model class (included below).
60
-
61
- The model is also available in **Hugging Face Transformers format** in this
62
- repository. The HF-format files include:
63
-
64
- - `model.safetensors`
65
- - `config.json`
66
- - `generation_config.json`
67
- - `tokenizer.json`
68
- - `tokenizer_config.json`
69
-
70
- The HF-format model can be loaded with `transformers` and is useful for standard
71
- Hugging Face workflows. Note that TinyGPT was trained with a separate,
72
- non-weight-tied LM head that includes a trained bias. Standard
73
- `GPT2LMHeadModel.from_pretrained()` loads the main model weights but treats
74
- `lm_head.bias` as an unexpected key because the default GPT-2 head is biasless.
75
- For exact TinyGPT inference, restore the LM-head bias as shown below or use
76
- `infer_hf.py` from the GitHub repo.
77
-
78
- ---
79
-
80
- ## Usage
81
-
82
- ### 1. Install dependencies
83
-
84
- Clone the repo and install requirements:
85
-
86
- ```bash
87
- git clone https://github.com/hemantvirmani/tinygpt
88
- cd tinygpt
89
- pip install -r requirements.txt
90
- ```
91
-
92
- ### 2. Get the model class
93
-
94
- The `TinyGPT` model class is available at:
95
- **[https://github.com/hemantvirmani/tinygpt](https://github.com/hemantvirmani/tinygpt)**
96
-
97
- Clone or download `tinygpt.py` and place it in your working directory.
98
-
99
- ### 3. Load weights and run inference
100
-
101
- ```python
102
- import tinygpt
103
-
104
- model = tinygpt.load_model_for_inference()
105
-
106
- prompts = [
107
- "Hello, I'm a language model,",
108
- "The human brain contains approximately",
109
- "Photosynthesis is the process by which plants",
110
- "The theory of relativity states that ",
111
- "The Roman Empire fell due to several factors including",
112
- "During the Industrial Revolution, workers ",
113
- "To solve a quadratic equation, you must first",
114
- "The key differences between mitosis and meiosis are ",
115
- "Once upon a time in ancient India, there lived a king who ",
116
- ]
117
-
118
- for prompt in prompts:
119
- print(f"\n{'='*60}")
120
- print(f"PROMPT: {prompt}")
121
- print(f"{'='*60}")
122
- print(model.generate_text(start_text=prompt, max_tokens=500, temperature=0.7))
123
- ```
124
-
125
- ### 4. Load the Hugging Face format model
126
-
127
- ```bash
128
- pip install torch transformers safetensors huggingface_hub
129
- ```
130
-
131
- ```python
132
- import torch
133
- from huggingface_hub import hf_hub_download
134
- from safetensors.torch import load_file
135
- from transformers import GPT2LMHeadModel, GPT2Tokenizer
136
-
137
- model_id = "hemantvirmani/tinyGPT"
138
-
139
- tokenizer = GPT2Tokenizer.from_pretrained(model_id)
140
- model = GPT2LMHeadModel.from_pretrained(model_id)
141
-
142
- # Restore TinyGPT's trained LM-head bias for exact inference.
143
- weights_path = hf_hub_download(repo_id=model_id, filename="model.safetensors")
144
- state_dict = load_file(weights_path, device="cpu")
145
- if "lm_head.bias" in state_dict:
146
- lm_head = torch.nn.Linear(model.config.n_embd, model.config.vocab_size, bias=True)
147
- lm_head.weight = torch.nn.Parameter(state_dict["lm_head.weight"])
148
- lm_head.bias = torch.nn.Parameter(state_dict["lm_head.bias"])
149
- model.lm_head = lm_head
150
-
151
- device = "cuda" if torch.cuda.is_available() else "cpu"
152
- model = model.to(device)
153
- model.eval()
154
-
155
- prompt = "Photosynthesis is the process by which plants"
156
- inputs = tokenizer(prompt, return_tensors="pt").to(device)
157
-
158
- with torch.no_grad():
159
- output_ids = model.generate(
160
- **inputs,
161
- max_new_tokens=500,
162
- do_sample=True,
163
- temperature=0.7,
164
- top_k=0,
165
- top_p=1.0,
166
- repetition_penalty=1.3,
167
- pad_token_id=tokenizer.eos_token_id,
168
- )
169
-
170
- print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
171
- ```
172
-
173
- You can also run the helper script from the GitHub repo:
174
-
175
- ```bash
176
- python infer_hf.py --model_dir hemantvirmani/tinyGPT --prompt "Photosynthesis is the process by which plants"
177
- ```
178
-
179
- ---
180
-
181
- ## Sample Outputs (temperature=0.7, 500 tokens)
182
-
183
- **Prompt:** `Photosynthesis is the process by which plants`
184
- > Photosynthesis is the process by which plants take in sunlight, water,
185
- > carbon dioxide and nutrients to produce energy for their cells. Humans
186
- > depend on photosynthesis to provide their own energy, but many plants
187
- > also use the energy of other organisms to produce food. The five types of...
188
-
189
- **Prompt:** `The Roman Empire fell due to several factors including`
190
- > The Roman Empire fell due to several factors including the decline of the
191
- > Roman army, the rise of the Papacy, and the threat of the Islamic invasion.
192
- > The fall of the Roman Empire was the result of a series of civil wars in
193
- > the late fourth century, and was led by the first emperor of the Roman
194
- > Empire, Constantine the Great.
195
-
196
- ---
197
-
198
- ## Limitations
199
-
200
- - This is a **base language model** — it completes text, it does not follow
201
- instructions or answer questions.
202
- - Prone to repetition loops, especially at low temperature.
203
- - Fine-tuning required for instruction-following or domain-specific tasks.
204
-
205
- ---
206
-
207
- ## Thanks to
208
-
209
- - Andrej Karpathy's nanoGPT - Video and Code
210
- - Dataset: HuggingFace [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)