ayayayya commited on
Commit
e600974
·
verified ·
1 Parent(s): c79af25

VerySmollGPT

Browse files
Files changed (6) hide show
  1. README.md +216 -3
  2. config.json +21 -0
  3. model.safetensors +3 -0
  4. special_tokens_map.json +6 -0
  5. tokenizer.json +122 -0
  6. tokenizer_config.json +13 -0
README.md CHANGED
@@ -1,3 +1,216 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - text-generation
7
+ - character-level
8
+ - tiny-stories
9
+ - raspberry-pi
10
+ - gpt
11
+ - decoder-only
12
+ datasets:
13
+ - roneneldan/TinyStories
14
+ metrics:
15
+ - perplexity
16
+ model-index:
17
+ - name: VerySmollGPT
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: TinyStories
24
+ type: roneneldan/TinyStories
25
+ metrics:
26
+ - type: loss
27
+ value: 0.6777
28
+ name: Training Loss (Final)
29
+ verified: false
30
+ - type: loss
31
+ value: 0.7028
32
+ name: Validation Loss (Final)
33
+ verified: false
34
+ - type: loss
35
+ value: 0.6924
36
+ name: Validation Loss (Best)
37
+ verified: false
38
+ ---
39
+
40
+ # VerySmollGPT
41
+
42
+ A lightweight character-level GPT model trained entirely on a **Raspberry Pi 5**. This model demonstrates that capable language models can be trained on consumer hardware with limited resources.
43
+
44
+ ## Model Description
45
+
46
+ VerySmollGPT is a decoder-only transformer model (GPT-style architecture) designed for character-level text generation. It was trained on the TinyStories dataset to generate coherent short stories.
47
+
48
+ - **Developed by:** Kittykat924
49
+ - **Model type:** Decoder-only Transformer (GPT)
50
+ - **Language:** English
51
+ - **License:** MIT
52
+ - **Trained on:** Raspberry Pi 5 (CPU only)
53
+ - **Training duration:** ~9 days
54
+ - **Parameters:** 4.80M (unique), 4.83M (with weight tying)
55
+
56
+ ## Model Architecture
57
+
58
+ | Component | Value |
59
+ |-----------|-------|
60
+ | Vocabulary Size | 104 characters |
61
+ | Embedding Dimension | 256 |
62
+ | Layers | 6 |
63
+ | Attention Heads | 8 |
64
+ | Feed-forward Dimension | 1024 |
65
+ | Context Window | 128 tokens |
66
+ | Dropout | 0.1 |
67
+ | Weight Tying | Yes (token embeddings ↔ output layer) |
68
+
69
+ ## Training Details
70
+
71
+ ### Training Data
72
+
73
+ - **Dataset:** [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
74
+ - **Dataset Size:** ~25MB (optimized for Raspberry Pi)
75
+ - **Total Tokens:** ~25M characters
76
+ - **Train/Val Split:** 90/10
77
+
78
+ ### Training Procedure
79
+
80
+ **Hardware:**
81
+ - Raspberry Pi 5
82
+ - CPU-only training (no GPU)
83
+ - Training time: ~9 days
84
+
85
+ **Hyperparameters:**
86
+ - Epochs: 3
87
+ - Batch Size: 16
88
+ - Learning Rate: 3e-4 (initial)
89
+ - Min Learning Rate: 1e-4 (cosine annealing)
90
+ - Optimizer: AdamW (β₁=0.9, β₂=0.95)
91
+ - Weight Decay: 0.01
92
+ - Gradient Clipping: 1.0
93
+ - Max Batches per Epoch: 130,000
94
+ - Context Window: 128 tokens
95
+
96
+ **Training Stats:**
97
+ - Final Epoch: 2 (checkpoint from epoch 3)
98
+ - Global Steps: 390,000
99
+ - Best Validation Loss: 0.692
100
+
101
+ ### Tokenization
102
+
103
+ Character-level tokenization with 104 unique tokens:
104
+ - 100 regular characters (letters, numbers, punctuation, special characters)
105
+ - 4 special tokens: `<PAD>`, `<UNK>`, `<BOS>`, `<EOS>`
106
+
107
+ ## Usage
108
+
109
+ ### Installation
110
+
111
+ ```bash
112
+ pip install torch safetensors
113
+ ```
114
+
115
+ ### Loading the Model
116
+
117
+ ```python
118
+ from safetensors.torch import load_file
119
+ import torch
120
+ import torch.nn as nn
121
+
122
+ # Load model weights
123
+ state_dict = load_file('model.safetensors')
124
+
125
+ # Load configuration
126
+ import json
127
+ with open('config.json', 'r') as f:
128
+ config = json.load(f)
129
+
130
+ # Note: You'll need to implement the VerySmollGPT architecture
131
+ # or use the original model.py from the repository
132
+ ```
133
+
134
+ ### Text Generation Example
135
+
136
+ ```python
137
+ # Assuming you have the model loaded
138
+ model.eval()
139
+
140
+ # Encode your prompt (character-level)
141
+ prompt = "Once upon a time"
142
+ input_ids = [char_to_idx[c] for c in prompt]
143
+ input_tensor = torch.tensor([input_ids], dtype=torch.long)
144
+
145
+ # Generate
146
+ with torch.no_grad():
147
+ output_ids = model.generate(
148
+ input_tensor,
149
+ max_new_tokens=200,
150
+ temperature=0.8,
151
+ top_k=40
152
+ )
153
+
154
+ # Decode output
155
+ generated_text = ''.join([idx_to_char[i] for i in output_ids[0].tolist()])
156
+ print(generated_text)
157
+ ```
158
+
159
+ ## Example Outputs
160
+
161
+ **Prompt:** "Once upon a time"
162
+
163
+ **Generated:**
164
+ > Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite was a penguin that had a shiny metal box on it. Timmy liked to...
165
+
166
+ **Prompt:** "The quick brown fox"
167
+
168
+ **Generated:**
169
+ > The quick brown fox wanted to play with him again. The fox said he was not fair anymore. He said he was sorry and that he learned his lesson...
170
+
171
+ ## Limitations and Bias
172
+
173
+ - **Character-level tokenization:** Less efficient than BPE/WordPiece for longer texts
174
+ - **Small context window:** 128 tokens limits long-range dependencies
175
+ - **Training data:** Limited to TinyStories dataset style (simple children's stories)
176
+ - **Vocabulary:** Only 104 characters, may not handle all Unicode characters
177
+ - **Coherence:** Best for short-form text generation (stories, snippets)
178
+
179
+ ## Environmental Impact
180
+
181
+ This model was intentionally trained on a Raspberry Pi 5 to demonstrate low-power AI training:
182
+
183
+ - **Hardware:** Raspberry Pi 5 (CPU only, ~15W power consumption)
184
+ - **Training Duration:** ~9 days
185
+ - **Estimated Energy:** ~3.24 kWh total
186
+ - **Carbon Footprint:** Minimal compared to GPU-based training
187
+
188
+ ## Technical Specifications
189
+
190
+ - **Model Size:** 19 MB (safetensors format)
191
+ - **Inference Memory:** ~200-300 MB RAM
192
+ - **Training Memory:** ~1-2 GB RAM (batch_size=16)
193
+ - **Precision:** FP32
194
+
195
+ ## Citation
196
+
197
+ If you use this model, please cite:
198
+
199
+ ```bibtex
200
+ @misc{verysmollgpt,
201
+ title={VerySmollGPT: A Character-Level GPT Trained on Raspberry Pi},
202
+ author={[Your Name]},
203
+ year={2024},
204
+ howpublished={\url{https://huggingface.co/[your-username]/VerySmollGPT}}
205
+ }
206
+ ```
207
+
208
+ ## Acknowledgments
209
+
210
+ - Architecture inspired by [Andrej Karpathy's nanoGPT](https://github.com/karpathy/nanoGPT)
211
+ - Dataset: [TinyStories by Ronen Eldan and Yuanzhi Li](https://huggingface.co/datasets/roneneldan/TinyStories)
212
+ - Trained on Raspberry Pi 5 to demonstrate accessible AI training
213
+
214
+ ## Model Card Contact
215
+
216
+ [Your contact information or GitHub repository link]
config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "VerySmollGPT",
3
+ "architectures": [
4
+ "VerySmollGPT"
5
+ ],
6
+ "vocab_size": 104,
7
+ "d_model": 256,
8
+ "n_layers": 6,
9
+ "n_heads": 8,
10
+ "d_ff": 1024,
11
+ "max_seq_len": 128,
12
+ "dropout": 0.1,
13
+ "block_size": 128,
14
+ "tie_word_embeddings": true,
15
+ "training_config": {
16
+ "num_epochs": 3,
17
+ "batch_size": 16,
18
+ "learning_rate": 0.0003,
19
+ "weight_decay": 0.01
20
+ }
21
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21a099127121614540ca3287fa6062bdc91c5838bcdb960c936d05ea413ca82e
3
+ size 19203248
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<BOS>",
3
+ "eos_token": "<EOS>",
4
+ "unk_token": "<UNK>",
5
+ "pad_token": "<PAD>"
6
+ }
tokenizer.json ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [],
6
+ "normalizer": null,
7
+ "pre_tokenizer": {
8
+ "type": "CharacterLevel"
9
+ },
10
+ "post_processor": null,
11
+ "decoder": null,
12
+ "model": {
13
+ "type": "CharacterLevel",
14
+ "vocab": {
15
+ "<PAD>": 0,
16
+ "<UNK>": 1,
17
+ "<BOS>": 2,
18
+ "<EOS>": 3,
19
+ "\t": 4,
20
+ "\n": 5,
21
+ " ": 6,
22
+ "!": 7,
23
+ "\"": 8,
24
+ "$": 9,
25
+ "&": 10,
26
+ "'": 11,
27
+ "(": 12,
28
+ ")": 13,
29
+ "*": 14,
30
+ "+": 15,
31
+ ",": 16,
32
+ "-": 17,
33
+ ".": 18,
34
+ "/": 19,
35
+ "0": 20,
36
+ "1": 21,
37
+ "2": 22,
38
+ "3": 23,
39
+ "4": 24,
40
+ "5": 25,
41
+ "6": 26,
42
+ "7": 27,
43
+ "8": 28,
44
+ "9": 29,
45
+ ":": 30,
46
+ ";": 31,
47
+ "?": 32,
48
+ "A": 33,
49
+ "B": 34,
50
+ "C": 35,
51
+ "D": 36,
52
+ "E": 37,
53
+ "F": 38,
54
+ "G": 39,
55
+ "H": 40,
56
+ "I": 41,
57
+ "J": 42,
58
+ "K": 43,
59
+ "L": 44,
60
+ "M": 45,
61
+ "N": 46,
62
+ "O": 47,
63
+ "P": 48,
64
+ "Q": 49,
65
+ "R": 50,
66
+ "S": 51,
67
+ "T": 52,
68
+ "U": 53,
69
+ "V": 54,
70
+ "W": 55,
71
+ "X": 56,
72
+ "Y": 57,
73
+ "Z": 58,
74
+ "a": 59,
75
+ "b": 60,
76
+ "c": 61,
77
+ "d": 62,
78
+ "e": 63,
79
+ "f": 64,
80
+ "g": 65,
81
+ "h": 66,
82
+ "i": 67,
83
+ "j": 68,
84
+ "k": 69,
85
+ "l": 70,
86
+ "m": 71,
87
+ "n": 72,
88
+ "o": 73,
89
+ "p": 74,
90
+ "q": 75,
91
+ "r": 76,
92
+ "s": 77,
93
+ "t": 78,
94
+ "u": 79,
95
+ "v": 80,
96
+ "w": 81,
97
+ "x": 82,
98
+ "y": 83,
99
+ "z": 84,
100
+ " ": 85,
101
+ "¡": 86,
102
+ "¦": 87,
103
+ "©": 88,
104
+ "«": 89,
105
+ "±": 90,
106
+ "³": 91,
107
+ "´": 92,
108
+ "»": 93,
109
+ "Â": 94,
110
+ "Ã": 95,
111
+ "â": 96,
112
+ "œ": 97,
113
+ "˜": 98,
114
+ "“": 99,
115
+ "”": 100,
116
+ "‰": 101,
117
+ "€": 102,
118
+ "™": 103
119
+ },
120
+ "unk_token": "<UNK>"
121
+ }
122
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "tokenizer_class": "CharTokenizer",
3
+ "model_type": "VerySmollGPT",
4
+ "vocab_size": 104,
5
+ "clean_up_tokenization_spaces": true,
6
+ "bos_token": "<BOS>",
7
+ "eos_token": "<EOS>",
8
+ "unk_token": "<UNK>",
9
+ "pad_token": "<PAD>",
10
+ "add_prefix_space": false,
11
+ "add_bos_token": false,
12
+ "add_eos_token": false
13
+ }