File size: 5,346 Bytes
e600974 2aa3a7a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
---
language:
- en
license: mit
tags:
- text-generation
- character-level
- tiny-stories
- raspberry-pi
- gpt
- decoder-only
datasets:
- roneneldan/TinyStories
metrics:
- perplexity
model-index:
- name: VerySmollGPT
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: TinyStories
type: roneneldan/TinyStories
metrics:
- type: loss
value: 0.6777
name: Training Loss (Final)
verified: false
- type: loss
value: 0.7028
name: Validation Loss (Final)
verified: false
- type: loss
value: 0.6924
name: Validation Loss (Best)
verified: false
---
# VerySmollGPT
A lightweight character-level GPT model trained entirely on a **Raspberry Pi 5**. This model demonstrates that capable language models can be trained on consumer hardware with limited resources.
## Model Description
VerySmollGPT is a decoder-only transformer model (GPT-style architecture) designed for character-level text generation. It was trained on the TinyStories dataset to generate coherent short stories.
- **Developed by:** Kittykat924
- **Model type:** Decoder-only Transformer (GPT)
- **Language:** English
- **License:** MIT
- **Trained on:** Raspberry Pi 5 (CPU only)
- **Training duration:** ~9 days
- **Parameters:** 4.80M (unique), 4.83M (with weight tying)
## Model Architecture
| Component | Value |
|-----------|-------|
| Vocabulary Size | 104 characters |
| Embedding Dimension | 256 |
| Layers | 6 |
| Attention Heads | 8 |
| Feed-forward Dimension | 1024 |
| Context Window | 128 tokens |
| Dropout | 0.1 |
| Weight Tying | Yes (token embeddings ↔ output layer) |
## Training Details
### Training Data
- **Dataset:** [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
- **Dataset Size:** ~25MB (optimized for Raspberry Pi)
- **Total Tokens:** ~25M characters
- **Train/Val Split:** 90/10
### Training Procedure
**Hardware:**
- Raspberry Pi 5
- CPU-only training (no GPU)
- Training time: ~9 days
**Hyperparameters:**
- Epochs: 3
- Batch Size: 16
- Learning Rate: 3e-4 (initial)
- Min Learning Rate: 1e-4 (cosine annealing)
- Optimizer: AdamW (β₁=0.9, β₂=0.95)
- Weight Decay: 0.01
- Gradient Clipping: 1.0
- Max Batches per Epoch: 130,000
- Context Window: 128 tokens
**Training Stats:**
- Final Epoch: 2 (checkpoint from epoch 3)
- Global Steps: 390,000
- Best Validation Loss: 0.692
### Tokenization
Character-level tokenization with 104 unique tokens:
- 100 regular characters (letters, numbers, punctuation, special characters)
- 4 special tokens: `<PAD>`, `<UNK>`, `<BOS>`, `<EOS>`
## Usage
### Installation
```bash
pip install torch safetensors
```
### Loading the Model
```python
from safetensors.torch import load_file
import torch
import torch.nn as nn
# Load model weights
state_dict = load_file('model.safetensors')
# Load configuration
import json
with open('config.json', 'r') as f:
config = json.load(f)
# Note: You'll need to implement the VerySmollGPT architecture
# or use the original model.py from the repository
```
### Text Generation Example
```python
# Assuming you have the model loaded
model.eval()
# Encode your prompt (character-level)
prompt = "Once upon a time"
input_ids = [char_to_idx[c] for c in prompt]
input_tensor = torch.tensor([input_ids], dtype=torch.long)
# Generate
with torch.no_grad():
output_ids = model.generate(
input_tensor,
max_new_tokens=200,
temperature=0.8,
top_k=40
)
# Decode output
generated_text = ''.join([idx_to_char[i] for i in output_ids[0].tolist()])
print(generated_text)
```
## Example Outputs
**Prompt:** "Once upon a time"
**Generated:**
> Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite was a penguin that had a shiny metal box on it. Timmy liked to...
**Prompt:** "The quick brown fox"
**Generated:**
> The quick brown fox wanted to play with him again. The fox said he was not fair anymore. He said he was sorry and that he learned his lesson...
## Limitations and Bias
- **Character-level tokenization:** Less efficient than BPE/WordPiece for longer texts
- **Small context window:** 128 tokens limits long-range dependencies
- **Training data:** Limited to TinyStories dataset style (simple children's stories)
- **Vocabulary:** Only 104 characters, may not handle all Unicode characters
- **Coherence:** Best for short-form text generation (stories, snippets)
## Environmental Impact
This model was intentionally trained on a Raspberry Pi 5 to demonstrate low-power AI training:
- **Hardware:** Raspberry Pi 5 (CPU only, ~15W power consumption)
- **Training Duration:** ~9 days
- **Estimated Energy:** ~3.24 kWh total
- **Carbon Footprint:** Minimal compared to GPU-based training
## Technical Specifications
- **Model Size:** 19 MB (safetensors format)
- **Inference Memory:** ~200-300 MB RAM
- **Training Memory:** ~1-2 GB RAM (batch_size=16)
- **Precision:** FP32
## Acknowledgments
- Architecture inspired by [Andrej Karpathy's nanoGPT](https://github.com/karpathy/nanoGPT)
- Dataset: [TinyStories by Ronen Eldan and Yuanzhi Li](https://huggingface.co/datasets/roneneldan/TinyStories)
- Trained on Raspberry Pi 5 to demonstrate accessible AI training
[Github](https://github.com/Igidn/VerySmollGPT) |