hemantvirmani commited on
Commit
65b2306
·
verified ·
1 Parent(s): 6d9f60e

pretrained and finetuned tinyGPT dataset

Browse files
.gitattributes ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ finetuning[[:space:]]alpaca/checkpoint/tinygpt_finetuned_checkpoint_alpaca.pt filter=lfs diff=lfs merge=lfs -text
2
+ finetuning[[:space:]]alpaca/huggingface/model.safetensors filter=lfs diff=lfs merge=lfs -text
3
+ pretraining/checkpoint/tinygpt_pretrained_checkpoint_438k.pt filter=lfs diff=lfs merge=lfs -text
4
+ pretraining/PyTorch[[:space:]]native/tinygpt_pretrained_weights.pt filter=lfs diff=lfs merge=lfs -text
5
+ pretraining/tinygpt[[:space:]]huggingface/model.safetensors filter=lfs diff=lfs merge=lfs -text
finetuning alpaca/README.md ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # TinyGPT-Alpaca — Instruction-Tuned GPT-2 Style LM (~163M)
5
+
6
+ TinyGPT pretrained base model (~163M params, val loss 2.84) instruction
7
+ fine-tuned on the [Alpaca Cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned)
8
+ dataset (52K examples). Trained with a custom PyTorch loop — no LoRA, no PEFT,
9
+ full fine-tune.
10
+
11
+ Built this project to develop hands-on intuition for LLMs - inspired by Andrej Karpathy's nanoGPT
12
+
13
+ ---
14
+
15
+ ## Model Details
16
+
17
+ | Parameter | Value |
18
+ | --- | --- |
19
+ | Architecture | Decoder-only Transformer (GPT-2 style) |
20
+ | Parameters | ~163M |
21
+ | Layers | 12 |
22
+ | Attention heads | 12 |
23
+ | Embedding dim | 768 |
24
+ | Context length | 1024 tokens (512 used during fine-tuning) |
25
+ | Vocab size | 50,257 |
26
+ | Tokenizer | GPT-2 BPE via `tiktoken` |
27
+ | Attention | Causal self-attention (Flash Attention via `F.scaled_dot_product_attention`) |
28
+ | LM head | Separate linear layer with bias (not weight-tied) |
29
+ | Base model | TinyGPT pretrained on FineWeb-Edu `sample-100BT` (val loss 2.84) |
30
+
31
+ ---
32
+
33
+ ## Fine-Tuning Details
34
+
35
+ | Detail | Value |
36
+ | --- | --- |
37
+ | Dataset | [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) (52K instruction-response pairs) |
38
+ | Prompt template | `### Instruction / ### Input (optional) / ### Response` |
39
+ | Max sequence length | 512 tokens |
40
+ | Val split | 10% (held out from the 52K) |
41
+ | Best val loss | **1.8405** (step 3,600 of 5,000) |
42
+ | Optimizer | AdamW (betas=(0.9, 0.95), eps=1e-8) |
43
+ | Learning rate | 1e-4 with linear warmup (100 steps) → cosine decay |
44
+ | Effective batch size | 64 (4 micro-batch × 16 gradient accumulation steps) |
45
+ | Weight decay | 0.01 |
46
+ | Gradient clipping | 1.0 |
47
+ | Dropout | **0.1** (critical — without it, train/val gap exceeded 0.80 within 2,000 steps) |
48
+ | Precision | bfloat16 (bf16) |
49
+ | Hardware | Kaggle T4 GPU |
50
+
51
+ ---
52
+
53
+ ## Format
54
+
55
+ Two formats are provided:
56
+
57
+ **1. Full training checkpoint** (`tinygpt_finetuned_checkpoint_alpaca.pt`)
58
+ A dict with keys: `model_state`, `optimizer_state`, `scheduler_state`, `step`, `val_loss`.
59
+ Useful if you want to resume training or inspect training metadata.
60
+ The file is ~2 GB (includes optimizer state).
61
+
62
+ **2. HuggingFace format** (`model.safetensors` + `config.json`)
63
+ Exported via `export_to_hf_alpaca.py` from the GitHub repo. Loadable with
64
+ `transformers`. Same `lm_head.bias` caveat as the pretrained model applies here
65
+ (see Usage below).
66
+
67
+ ---
68
+
69
+ ## Prompt Template
70
+
71
+ This model was trained on the Alpaca instruction format. Always wrap prompts in
72
+ this template — the model has learned to respond after `### Response:`.
73
+
74
+ **Without input context:**
75
+ ```
76
+ ### Instruction:
77
+ {your instruction here}
78
+
79
+ ### Response:
80
+ ```
81
+
82
+ **With input context:**
83
+ ```
84
+ ### Instruction:
85
+ {your instruction here}
86
+
87
+ ### Input:
88
+ {additional context here}
89
+
90
+ ### Response:
91
+ ```
92
+
93
+ ---
94
+
95
+ ## Usage
96
+
97
+ ### 1. Install dependencies
98
+
99
+ ```bash
100
+ git clone https://github.com/hemantvirmani/tinygpt
101
+ cd tinygpt
102
+ pip install torch tiktoken
103
+ ```
104
+
105
+ ### 2. Load PyTorch checkpoint and run inference
106
+
107
+ ```python
108
+ import torch
109
+ import tiktoken
110
+ import tinygpt
111
+
112
+ device = "cuda" if torch.cuda.is_available() else "cpu"
113
+
114
+ # Load the full checkpoint and extract model weights
115
+ ckpt = torch.load("tinygpt_finetuned_checkpoint_alpaca.pt", map_location=device, weights_only=False)
116
+ state_dict = ckpt["model_state"]
117
+ print(f"Loaded checkpoint — step: {ckpt['step']} | val loss: {ckpt['val_loss']:.4f}")
118
+
119
+ # Strip _orig_mod. prefix if checkpoint came from a torch.compile() run
120
+ if any(k.startswith("_orig_mod.") for k in state_dict):
121
+ state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
122
+
123
+ enc = tiktoken.get_encoding("gpt2")
124
+ state = tinygpt.State(tokenizer=enc, train_data=None, val_data=None, vocab_size=enc.n_vocab)
125
+ model = tinygpt.TinyGPT(state).to(device)
126
+ model.load_state_dict(state_dict)
127
+ model.eval()
128
+
129
+ # Run inference with instruction template
130
+ def ask(instruction, input_text="", max_tokens=200, temperature=0.7):
131
+ if input_text:
132
+ prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n"
133
+ else:
134
+ prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"
135
+ return model.generate_text(start_text=prompt, max_tokens=max_tokens, temperature=temperature)
136
+
137
+ print(ask("What is photosynthesis?"))
138
+ print(ask("Explain the water cycle in simple terms."))
139
+ print(ask("Summarize the following text.", input_text="The moon orbits Earth once every 27 days."))
140
+ ```
141
+
142
+ ### 3. Load the HuggingFace format model
143
+
144
+ ```bash
145
+ pip install torch transformers safetensors huggingface_hub
146
+ ```
147
+
148
+ ```python
149
+ import torch
150
+ from huggingface_hub import hf_hub_download
151
+ from safetensors.torch import load_file
152
+ from transformers import GPT2LMHeadModel, GPT2Tokenizer
153
+
154
+ model_id = "hemantvirmani/tinyGPT-alpaca"
155
+
156
+ tokenizer = GPT2Tokenizer.from_pretrained(model_id)
157
+ model = GPT2LMHeadModel.from_pretrained(model_id)
158
+
159
+ # Restore TinyGPT's trained LM-head bias for exact inference
160
+ weights_path = hf_hub_download(repo_id=model_id, filename="model.safetensors")
161
+ state_dict = load_file(weights_path, device="cpu")
162
+ if "lm_head.bias" in state_dict:
163
+ lm_head = torch.nn.Linear(model.config.n_embd, model.config.vocab_size, bias=True)
164
+ lm_head.weight = torch.nn.Parameter(state_dict["lm_head.weight"])
165
+ lm_head.bias = torch.nn.Parameter(state_dict["lm_head.bias"])
166
+ model.lm_head = lm_head
167
+
168
+ device = "cuda" if torch.cuda.is_available() else "cpu"
169
+ model = model.to(device).eval()
170
+
171
+ instruction = "What is photosynthesis?"
172
+ prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"
173
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
174
+
175
+ with torch.no_grad():
176
+ output_ids = model.generate(
177
+ **inputs,
178
+ max_new_tokens=200,
179
+ do_sample=True,
180
+ temperature=0.7,
181
+ top_k=50,
182
+ repetition_penalty=1.3,
183
+ pad_token_id=tokenizer.eos_token_id,
184
+ )
185
+
186
+ print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
187
+ ```
188
+
189
+ ---
190
+
191
+ ## Sample Outputs (temperature=0.7, after fine-tuning)
192
+
193
+ **Prompt:** `What is photosynthesis?`
194
+ > Photosynthesis is the process by which plants, algae, and some bacteria
195
+ > convert sunlight, water, and carbon dioxide into glucose, oxygen, and other
196
+ > chemical compounds. It is a crucial process for the survival and reproduction
197
+ > of life on Earth and is critical for the greenhouse effect, water cycle, and
198
+ > the carbon cycle.
199
+
200
+ **Prompt:** `Explain the water cycle in simple terms.`
201
+ > The water cycle, also known as the hydrologic cycle, is the continuous
202
+ > movement of water on, above, and below the surface of the earth. It starts
203
+ > with the evaporation of water from the ground and rises into the atmosphere
204
+ > through the process of precipitation. The water in the oceans and other bodies
205
+ > of water evaporates from the surface of the earth in order to be returned to
206
+ > the earth's surface through precipitation.
207
+
208
+ ---
209
+
210
+ ## Limitations
211
+
212
+ - **163M parameters** — factual accuracy is limited. The model learns the
213
+ instruction-response *format* quickly (within the first 100 steps) but
214
+ factual depth is constrained by model capacity.
215
+ - **Not RLHF-tuned** — no safety guardrails, no preference alignment.
216
+ - **Trained on Alpaca Cleaned (52K)** — may not generalize well to complex,
217
+ multi-step, or domain-specific instructions.
218
+ - Can degenerate on some questions (e.g., repeating `### Response:` headers).
219
+ Use `repetition_penalty=1.3` to mitigate.
220
+ - The base model was trained on formal educational text (FineWeb-Edu); that
221
+ bias carries through to instruction-following.
222
+
223
+ ---
224
+
225
+ ## Thanks to
226
+
227
+ - Andrej Karpathy's nanoGPT — architecture inspiration
228
+ - Dataset: [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned)
229
+ - Base model: [hemantvirmani/tinyGPT](https://huggingface.co/hemantvirmani/tinyGPT)
230
+ - Compute: Kaggle (T4 GPU)
finetuning alpaca/checkpoint/tinygpt_finetuned_checkpoint_alpaca.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b07938beb5b22b699314cb101ec6ac101f48fa8ae47355e1ee1907d31f4ac9b1
3
+ size 2006993967
finetuning alpaca/huggingface/config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_function": "gelu",
3
+ "add_cross_attention": false,
4
+ "architectures": [
5
+ "GPT2LMHeadModel"
6
+ ],
7
+ "attn_pdrop": 0.0,
8
+ "bos_token_id": 50256,
9
+ "dtype": "float32",
10
+ "embd_pdrop": 0.0,
11
+ "eos_token_id": 50256,
12
+ "initializer_range": 0.02,
13
+ "layer_norm_epsilon": 1e-05,
14
+ "model_type": "gpt2",
15
+ "n_embd": 768,
16
+ "n_head": 12,
17
+ "n_inner": null,
18
+ "n_layer": 12,
19
+ "n_positions": 1024,
20
+ "pad_token_id": null,
21
+ "reorder_and_upcast_attn": false,
22
+ "resid_pdrop": 0.0,
23
+ "scale_attn_by_inverse_layer_idx": false,
24
+ "scale_attn_weights": true,
25
+ "summary_activation": null,
26
+ "summary_first_dropout": 0.1,
27
+ "summary_proj_to_labels": true,
28
+ "summary_type": "cls_index",
29
+ "summary_use_proj": true,
30
+ "tie_word_embeddings": false,
31
+ "transformers_version": "5.3.0",
32
+ "use_cache": true,
33
+ "vocab_size": 50257
34
+ }
finetuning alpaca/huggingface/generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50256,
4
+ "eos_token_id": 50256,
5
+ "output_attentions": false,
6
+ "output_hidden_states": false,
7
+ "transformers_version": "5.3.0",
8
+ "use_cache": true
9
+ }
finetuning alpaca/huggingface/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19486203f84dd502fc571085190e6f90794de219677cd7ecfd00c46d39c24011
3
+ size 652365020
finetuning alpaca/huggingface/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
finetuning alpaca/huggingface/tokenizer_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<|endoftext|>",
5
+ "eos_token": "<|endoftext|>",
6
+ "errors": "replace",
7
+ "is_local": false,
8
+ "model_max_length": 1024,
9
+ "pad_token": null,
10
+ "tokenizer_class": "GPT2Tokenizer",
11
+ "unk_token": "<|endoftext|>"
12
+ }
pretraining/PyTorch native/tinygpt_pretrained_weights.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5519339ae282c0a32db9934589a501908e5acb498047397558767e17f9a9856e
3
+ size 702547947
pretraining/README.md ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # TinyGPT — GPT-2 Style LM (~163M) trained on FineWeb-Edu
5
+
6
+ A GPT-2 style decoder-only transformer pretrained from scratch on ~43B tokens
7
+ of the FineWeb-Edu dataset, achieving a validation loss of **2.84**.
8
+
9
+ Built this project to develop hands-on intuition for LLMs - inspired by Andrej Karpathy's nanoGPT
10
+
11
+ ---
12
+
13
+ ## Model Details
14
+
15
+ | Parameter | Value |
16
+ |-----------|-------|
17
+ | Architecture | Decoder-only Transformer (GPT-2 style) |
18
+ | Parameters | ~163M |
19
+ | Layers | 12 |
20
+ | Attention heads | 12 |
21
+ | Embedding dim | 768 |
22
+ | Context length | 1024 tokens |
23
+ | Vocab size | 50,257 |
24
+ | Tokenizer | GPT-2 BPE via `tiktoken` |
25
+ | Attention | Causal self-attention (Flash Attention via `F.scaled_dot_product_attention`) |
26
+ | LM head | Separate linear layer (not weight-tied) |
27
+
28
+ > **Why ~163M and not 124M?** Standard GPT-2 124M ties the LM head weights
29
+ > with the token embedding table, saving ~38M parameters. TinyGPT uses a
30
+ > separate `nn.Linear` head, resulting in ~163M total parameters.
31
+
32
+ ---
33
+
34
+ ## Training Details
35
+
36
+ | Detail | Value |
37
+ |--------|-------|
38
+ | Dataset | [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (`sample-100BT` subset) |
39
+ | Tokens trained | ~43B |
40
+ | Validation loss | 2.84 |
41
+ | Optimizer | AdamW (betas=(0.9, 0.95), eps=1e-8) |
42
+ | Learning rate | 6e-4 |
43
+ | LR schedule | Linear warmup (4000 steps) -> Cosine decay to 6e-5 |
44
+ | Effective batch size | 512 (16 x 32 gradient accumulation steps) |
45
+ | Weight decay | 0.1 |
46
+ | Gradient clipping | 1.0 |
47
+ | Precision | bfloat16 (bf16) |
48
+ | Max iterations | 600,000 |
49
+ | Dropout | 0.0 |
50
+
51
+ ---
52
+
53
+ ## Format
54
+
55
+ Weights are saved in **PyTorch native format** — a plain state dict saved with
56
+ `torch.save()`, containing only model weights (no optimizer state, no
57
+ scheduler). The file is ~670MB.
58
+
59
+ To load, you need the `TinyGPT` model class (included below).
60
+
61
+ The model is also available in **Hugging Face Transformers format** in this
62
+ repository. The HF-format files include:
63
+
64
+ - `model.safetensors`
65
+ - `config.json`
66
+ - `generation_config.json`
67
+ - `tokenizer.json`
68
+ - `tokenizer_config.json`
69
+
70
+ The HF-format model can be loaded with `transformers` and is useful for standard
71
+ Hugging Face workflows. Note that TinyGPT was trained with a separate,
72
+ non-weight-tied LM head that includes a trained bias. Standard
73
+ `GPT2LMHeadModel.from_pretrained()` loads the main model weights but treats
74
+ `lm_head.bias` as an unexpected key because the default GPT-2 head is biasless.
75
+ For exact TinyGPT inference, restore the LM-head bias as shown below or use
76
+ `infer_hf.py` from the GitHub repo.
77
+
78
+ ---
79
+
80
+ ## Usage
81
+
82
+ ### 1. Install dependencies
83
+
84
+ Clone the repo and install requirements:
85
+
86
+ ```bash
87
+ git clone https://github.com/hemantvirmani/tinygpt
88
+ cd tinygpt
89
+ pip install -r requirements.txt
90
+ ```
91
+
92
+ ### 2. Get the model class
93
+
94
+ The `TinyGPT` model class is available at:
95
+ **[https://github.com/hemantvirmani/tinygpt](https://github.com/hemantvirmani/tinygpt)**
96
+
97
+ Clone or download `tinygpt.py` and place it in your working directory.
98
+
99
+ ### 3. Load weights and run inference
100
+
101
+ ```python
102
+ import tinygpt
103
+
104
+ model = tinygpt.load_model_for_inference()
105
+
106
+ prompts = [
107
+ "Hello, I'm a language model,",
108
+ "The human brain contains approximately",
109
+ "Photosynthesis is the process by which plants",
110
+ "The theory of relativity states that ",
111
+ "The Roman Empire fell due to several factors including",
112
+ "During the Industrial Revolution, workers ",
113
+ "To solve a quadratic equation, you must first",
114
+ "The key differences between mitosis and meiosis are ",
115
+ "Once upon a time in ancient India, there lived a king who ",
116
+ ]
117
+
118
+ for prompt in prompts:
119
+ print(f"\n{'='*60}")
120
+ print(f"PROMPT: {prompt}")
121
+ print(f"{'='*60}")
122
+ print(model.generate_text(start_text=prompt, max_tokens=500, temperature=0.7))
123
+ ```
124
+
125
+ ### 4. Load the Hugging Face format model
126
+
127
+ ```bash
128
+ pip install torch transformers safetensors huggingface_hub
129
+ ```
130
+
131
+ ```python
132
+ import torch
133
+ from huggingface_hub import hf_hub_download
134
+ from safetensors.torch import load_file
135
+ from transformers import GPT2LMHeadModel, GPT2Tokenizer
136
+
137
+ model_id = "hemantvirmani/tinyGPT"
138
+
139
+ tokenizer = GPT2Tokenizer.from_pretrained(model_id)
140
+ model = GPT2LMHeadModel.from_pretrained(model_id)
141
+
142
+ # Restore TinyGPT's trained LM-head bias for exact inference.
143
+ weights_path = hf_hub_download(repo_id=model_id, filename="model.safetensors")
144
+ state_dict = load_file(weights_path, device="cpu")
145
+ if "lm_head.bias" in state_dict:
146
+ lm_head = torch.nn.Linear(model.config.n_embd, model.config.vocab_size, bias=True)
147
+ lm_head.weight = torch.nn.Parameter(state_dict["lm_head.weight"])
148
+ lm_head.bias = torch.nn.Parameter(state_dict["lm_head.bias"])
149
+ model.lm_head = lm_head
150
+
151
+ device = "cuda" if torch.cuda.is_available() else "cpu"
152
+ model = model.to(device)
153
+ model.eval()
154
+
155
+ prompt = "Photosynthesis is the process by which plants"
156
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
157
+
158
+ with torch.no_grad():
159
+ output_ids = model.generate(
160
+ **inputs,
161
+ max_new_tokens=500,
162
+ do_sample=True,
163
+ temperature=0.7,
164
+ top_k=0,
165
+ top_p=1.0,
166
+ repetition_penalty=1.3,
167
+ pad_token_id=tokenizer.eos_token_id,
168
+ )
169
+
170
+ print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
171
+ ```
172
+
173
+ You can also run the helper script from the GitHub repo:
174
+
175
+ ```bash
176
+ python infer_hf.py --model_dir hemantvirmani/tinyGPT --prompt "Photosynthesis is the process by which plants"
177
+ ```
178
+
179
+ ---
180
+
181
+ ## Sample Outputs (temperature=0.7, 500 tokens)
182
+
183
+ **Prompt:** `Photosynthesis is the process by which plants`
184
+ > Photosynthesis is the process by which plants take in sunlight, water,
185
+ > carbon dioxide and nutrients to produce energy for their cells. Humans
186
+ > depend on photosynthesis to provide their own energy, but many plants
187
+ > also use the energy of other organisms to produce food. The five types of...
188
+
189
+ **Prompt:** `The Roman Empire fell due to several factors including`
190
+ > The Roman Empire fell due to several factors including the decline of the
191
+ > Roman army, the rise of the Papacy, and the threat of the Islamic invasion.
192
+ > The fall of the Roman Empire was the result of a series of civil wars in
193
+ > the late fourth century, and was led by the first emperor of the Roman
194
+ > Empire, Constantine the Great.
195
+
196
+ ---
197
+
198
+ ## Limitations
199
+
200
+ - This is a **base language model** — it completes text, it does not follow
201
+ instructions or answer questions.
202
+ - Prone to repetition loops, especially at low temperature.
203
+ - Fine-tuning required for instruction-following or domain-specific tasks.
204
+
205
+ ---
206
+
207
+ ## Thanks to
208
+
209
+ - Andrej Karpathy's nanoGPT - Video and Code
210
+ - Dataset: HuggingFace [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
pretraining/checkpoint/tinygpt_pretrained_checkpoint_438k.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:653308efbfbf616df128e652c48c3eac1ba72694d4cafed2aaae07e415c0a045
3
+ size 2006991266
pretraining/tinygpt huggingface/config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_function": "gelu",
3
+ "add_cross_attention": false,
4
+ "architectures": [
5
+ "GPT2LMHeadModel"
6
+ ],
7
+ "attn_pdrop": 0.0,
8
+ "bos_token_id": 50256,
9
+ "dtype": "float32",
10
+ "embd_pdrop": 0.0,
11
+ "eos_token_id": 50256,
12
+ "initializer_range": 0.02,
13
+ "layer_norm_epsilon": 1e-05,
14
+ "model_type": "gpt2",
15
+ "n_embd": 768,
16
+ "n_head": 12,
17
+ "n_inner": null,
18
+ "n_layer": 12,
19
+ "n_positions": 1024,
20
+ "pad_token_id": null,
21
+ "reorder_and_upcast_attn": false,
22
+ "resid_pdrop": 0.0,
23
+ "scale_attn_by_inverse_layer_idx": false,
24
+ "scale_attn_weights": true,
25
+ "summary_activation": null,
26
+ "summary_first_dropout": 0.1,
27
+ "summary_proj_to_labels": true,
28
+ "summary_type": "cls_index",
29
+ "summary_use_proj": true,
30
+ "tie_word_embeddings": false,
31
+ "transformers_version": "5.3.0",
32
+ "use_cache": true,
33
+ "vocab_size": 50257
34
+ }
pretraining/tinygpt huggingface/generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50256,
4
+ "eos_token_id": 50256,
5
+ "output_attentions": false,
6
+ "output_hidden_states": false,
7
+ "transformers_version": "5.3.0",
8
+ "use_cache": true
9
+ }
pretraining/tinygpt huggingface/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d89305ea93a964f09e6ed382eb3f24726997bf564601fa46e2f6d226cfc0cf53
3
+ size 652365020
pretraining/tinygpt huggingface/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
pretraining/tinygpt huggingface/tokenizer_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<|endoftext|>",
5
+ "eos_token": "<|endoftext|>",
6
+ "errors": "replace",
7
+ "is_local": false,
8
+ "model_max_length": 1024,
9
+ "pad_token": null,
10
+ "tokenizer_class": "GPT2Tokenizer",
11
+ "unk_token": "<|endoftext|>"
12
+ }