File size: 5,346 Bytes
e600974
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2aa3a7a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
language:
- en
license: mit
tags:
- text-generation
- character-level
- tiny-stories
- raspberry-pi
- gpt
- decoder-only
datasets:
- roneneldan/TinyStories
metrics:
- perplexity
model-index:
- name: VerySmollGPT
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TinyStories
      type: roneneldan/TinyStories
    metrics:
    - type: loss
      value: 0.6777
      name: Training Loss (Final)
      verified: false
    - type: loss
      value: 0.7028
      name: Validation Loss (Final)
      verified: false
    - type: loss
      value: 0.6924
      name: Validation Loss (Best)
      verified: false
---

# VerySmollGPT

A lightweight character-level GPT model trained entirely on a **Raspberry Pi 5**. This model demonstrates that capable language models can be trained on consumer hardware with limited resources.

## Model Description

VerySmollGPT is a decoder-only transformer model (GPT-style architecture) designed for character-level text generation. It was trained on the TinyStories dataset to generate coherent short stories.

- **Developed by:** Kittykat924
- **Model type:** Decoder-only Transformer (GPT)
- **Language:** English
- **License:** MIT
- **Trained on:** Raspberry Pi 5 (CPU only)
- **Training duration:** ~9 days
- **Parameters:** 4.80M (unique), 4.83M (with weight tying)

## Model Architecture

| Component | Value |
|-----------|-------|
| Vocabulary Size | 104 characters |
| Embedding Dimension | 256 |
| Layers | 6 |
| Attention Heads | 8 |
| Feed-forward Dimension | 1024 |
| Context Window | 128 tokens |
| Dropout | 0.1 |
| Weight Tying | Yes (token embeddings ↔ output layer) |

## Training Details

### Training Data

- **Dataset:** [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
- **Dataset Size:** ~25MB (optimized for Raspberry Pi)
- **Total Tokens:** ~25M characters
- **Train/Val Split:** 90/10

### Training Procedure

**Hardware:**
- Raspberry Pi 5
- CPU-only training (no GPU)
- Training time: ~9 days

**Hyperparameters:**
- Epochs: 3
- Batch Size: 16
- Learning Rate: 3e-4 (initial)
- Min Learning Rate: 1e-4 (cosine annealing)
- Optimizer: AdamW (β₁=0.9, β₂=0.95)
- Weight Decay: 0.01
- Gradient Clipping: 1.0
- Max Batches per Epoch: 130,000
- Context Window: 128 tokens

**Training Stats:**
- Final Epoch: 2 (checkpoint from epoch 3)
- Global Steps: 390,000
- Best Validation Loss: 0.692

### Tokenization

Character-level tokenization with 104 unique tokens:
- 100 regular characters (letters, numbers, punctuation, special characters)
- 4 special tokens: `<PAD>`, `<UNK>`, `<BOS>`, `<EOS>`

## Usage

### Installation

```bash
pip install torch safetensors
```

### Loading the Model

```python
from safetensors.torch import load_file
import torch
import torch.nn as nn

# Load model weights
state_dict = load_file('model.safetensors')

# Load configuration
import json
with open('config.json', 'r') as f:
    config = json.load(f)

# Note: You'll need to implement the VerySmollGPT architecture
# or use the original model.py from the repository
```

### Text Generation Example

```python
# Assuming you have the model loaded
model.eval()

# Encode your prompt (character-level)
prompt = "Once upon a time"
input_ids = [char_to_idx[c] for c in prompt]
input_tensor = torch.tensor([input_ids], dtype=torch.long)

# Generate
with torch.no_grad():
    output_ids = model.generate(
        input_tensor,
        max_new_tokens=200,
        temperature=0.8,
        top_k=40
    )

# Decode output
generated_text = ''.join([idx_to_char[i] for i in output_ids[0].tolist()])
print(generated_text)
```

## Example Outputs

**Prompt:** "Once upon a time"

**Generated:**
> Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite was a penguin that had a shiny metal box on it. Timmy liked to...

**Prompt:** "The quick brown fox"

**Generated:**
> The quick brown fox wanted to play with him again. The fox said he was not fair anymore. He said he was sorry and that he learned his lesson...

## Limitations and Bias

- **Character-level tokenization:** Less efficient than BPE/WordPiece for longer texts
- **Small context window:** 128 tokens limits long-range dependencies
- **Training data:** Limited to TinyStories dataset style (simple children's stories)
- **Vocabulary:** Only 104 characters, may not handle all Unicode characters
- **Coherence:** Best for short-form text generation (stories, snippets)

## Environmental Impact

This model was intentionally trained on a Raspberry Pi 5 to demonstrate low-power AI training:

- **Hardware:** Raspberry Pi 5 (CPU only, ~15W power consumption)
- **Training Duration:** ~9 days
- **Estimated Energy:** ~3.24 kWh total
- **Carbon Footprint:** Minimal compared to GPU-based training

## Technical Specifications

- **Model Size:** 19 MB (safetensors format)
- **Inference Memory:** ~200-300 MB RAM
- **Training Memory:** ~1-2 GB RAM (batch_size=16)
- **Precision:** FP32


## Acknowledgments

- Architecture inspired by [Andrej Karpathy's nanoGPT](https://github.com/karpathy/nanoGPT)
- Dataset: [TinyStories by Ronen Eldan and Yuanzhi Li](https://huggingface.co/datasets/roneneldan/TinyStories)
- Trained on Raspberry Pi 5 to demonstrate accessible AI training


[Github](https://github.com/Igidn/VerySmollGPT)