VerySmollGPT

Browse files

Files changed (6) hide show

README.md +216 -3
config.json +21 -0
model.safetensors +3 -0
special_tokens_map.json +6 -0
tokenizer.json +122 -0
tokenizer_config.json +13 -0

README.md CHANGED Viewed

@@ -1,3 +1,216 @@
----
-license: mit
----

+---
+language:
+- en
+license: mit
+tags:
+- text-generation
+- character-level
+- tiny-stories
+- raspberry-pi
+- gpt
+- decoder-only
+datasets:
+- roneneldan/TinyStories
+metrics:
+- perplexity
+model-index:
+- name: VerySmollGPT
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TinyStories
+      type: roneneldan/TinyStories
+    metrics:
+    - type: loss
+      value: 0.6777
+      name: Training Loss (Final)
+      verified: false
+    - type: loss
+      value: 0.7028
+      name: Validation Loss (Final)
+      verified: false
+    - type: loss
+      value: 0.6924
+      name: Validation Loss (Best)
+      verified: false
+---
+# VerySmollGPT
+A lightweight character-level GPT model trained entirely on a **Raspberry Pi 5**. This model demonstrates that capable language models can be trained on consumer hardware with limited resources.
+## Model Description
+VerySmollGPT is a decoder-only transformer model (GPT-style architecture) designed for character-level text generation. It was trained on the TinyStories dataset to generate coherent short stories.
+- **Developed by:** Kittykat924
+- **Model type:** Decoder-only Transformer (GPT)
+- **Language:** English
+- **License:** MIT
+- **Trained on:** Raspberry Pi 5 (CPU only)
+- **Training duration:** ~9 days
+- **Parameters:** 4.80M (unique), 4.83M (with weight tying)
+## Model Architecture
+| Component | Value |
+|-----------|-------|
+| Vocabulary Size | 104 characters |
+| Embedding Dimension | 256 |
+| Layers | 6 |
+| Attention Heads | 8 |
+| Feed-forward Dimension | 1024 |
+| Context Window | 128 tokens |
+| Dropout | 0.1 |
+| Weight Tying | Yes (token embeddings ↔ output layer) |
+## Training Details
+### Training Data
+- **Dataset:** [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
+- **Dataset Size:** ~25MB (optimized for Raspberry Pi)
+- **Total Tokens:** ~25M characters
+- **Train/Val Split:** 90/10
+### Training Procedure
+**Hardware:**
+- Raspberry Pi 5
+- CPU-only training (no GPU)
+- Training time: ~9 days
+**Hyperparameters:**
+- Epochs: 3
+- Batch Size: 16
+- Learning Rate: 3e-4 (initial)
+- Min Learning Rate: 1e-4 (cosine annealing)
+- Optimizer: AdamW (β₁=0.9, β₂=0.95)
+- Weight Decay: 0.01
+- Gradient Clipping: 1.0
+- Max Batches per Epoch: 130,000
+- Context Window: 128 tokens
+**Training Stats:**
+- Final Epoch: 2 (checkpoint from epoch 3)
+- Global Steps: 390,000
+- Best Validation Loss: 0.692
+### Tokenization
+Character-level tokenization with 104 unique tokens:
+- 100 regular characters (letters, numbers, punctuation, special characters)
+- 4 special tokens: `<PAD>`, `<UNK>`, `<BOS>`, `<EOS>`
+## Usage
+### Installation
+```bash
+pip install torch safetensors
+```
+### Loading the Model
+```python
+from safetensors.torch import load_file
+import torch
+import torch.nn as nn
+# Load model weights
+state_dict = load_file('model.safetensors')
+# Load configuration
+import json
+with open('config.json', 'r') as f:
+    config = json.load(f)
+# Note: You'll need to implement the VerySmollGPT architecture
+# or use the original model.py from the repository
+```
+### Text Generation Example
+```python
+# Assuming you have the model loaded
+model.eval()
+# Encode your prompt (character-level)
+prompt = "Once upon a time"
+input_ids = [char_to_idx[c] for c in prompt]
+input_tensor = torch.tensor([input_ids], dtype=torch.long)
+# Generate
+with torch.no_grad():
+    output_ids = model.generate(
+        input_tensor,
+        max_new_tokens=200,
+        temperature=0.8,
+        top_k=40
+    )
+# Decode output
+generated_text = ''.join([idx_to_char[i] for i in output_ids[0].tolist()])
+print(generated_text)
+```
+## Example Outputs
+**Prompt:** "Once upon a time"
+**Generated:**
+> Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite was a penguin that had a shiny metal box on it. Timmy liked to...
+**Prompt:** "The quick brown fox"
+**Generated:**
+> The quick brown fox wanted to play with him again. The fox said he was not fair anymore. He said he was sorry and that he learned his lesson...
+## Limitations and Bias
+- **Character-level tokenization:** Less efficient than BPE/WordPiece for longer texts
+- **Small context window:** 128 tokens limits long-range dependencies
+- **Training data:** Limited to TinyStories dataset style (simple children's stories)
+- **Vocabulary:** Only 104 characters, may not handle all Unicode characters
+- **Coherence:** Best for short-form text generation (stories, snippets)
+## Environmental Impact
+This model was intentionally trained on a Raspberry Pi 5 to demonstrate low-power AI training:
+- **Hardware:** Raspberry Pi 5 (CPU only, ~15W power consumption)
+- **Training Duration:** ~9 days
+- **Estimated Energy:** ~3.24 kWh total
+- **Carbon Footprint:** Minimal compared to GPU-based training
+## Technical Specifications
+- **Model Size:** 19 MB (safetensors format)
+- **Inference Memory:** ~200-300 MB RAM
+- **Training Memory:** ~1-2 GB RAM (batch_size=16)
+- **Precision:** FP32
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{verysmollgpt,
+  title={VerySmollGPT: A Character-Level GPT Trained on Raspberry Pi},
+  author={[Your Name]},
+  year={2024},
+  howpublished={\url{https://huggingface.co/[your-username]/VerySmollGPT}}
+}
+```
+## Acknowledgments
+- Architecture inspired by [Andrej Karpathy's nanoGPT](https://github.com/karpathy/nanoGPT)
+- Dataset: [TinyStories by Ronen Eldan and Yuanzhi Li](https://huggingface.co/datasets/roneneldan/TinyStories)
+- Trained on Raspberry Pi 5 to demonstrate accessible AI training
+## Model Card Contact
+[Your contact information or GitHub repository link]

config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "model_type": "VerySmollGPT",
+  "architectures": [
+    "VerySmollGPT"
+  ],
+  "vocab_size": 104,
+  "d_model": 256,
+  "n_layers": 6,
+  "n_heads": 8,
+  "d_ff": 1024,
+  "max_seq_len": 128,
+  "dropout": 0.1,
+  "block_size": 128,
+  "tie_word_embeddings": true,
+  "training_config": {
+    "num_epochs": 3,
+    "batch_size": 16,
+    "learning_rate": 0.0003,
+    "weight_decay": 0.01
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:21a099127121614540ca3287fa6062bdc91c5838bcdb960c936d05ea413ca82e
+size 19203248

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "bos_token": "<BOS>",
+  "eos_token": "<EOS>",
+  "unk_token": "<UNK>",
+  "pad_token": "<PAD>"
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,122 @@

+{
+  "version": "1.0",
+  "truncation": null,
+  "padding": null,
+  "added_tokens": [],
+  "normalizer": null,
+  "pre_tokenizer": {
+    "type": "CharacterLevel"
+  },
+  "post_processor": null,
+  "decoder": null,
+  "model": {
+    "type": "CharacterLevel",
+    "vocab": {
+      "<PAD>": 0,
+      "<UNK>": 1,
+      "<BOS>": 2,
+      "<EOS>": 3,
+      "\t": 4,
+      "\n": 5,
+      " ": 6,
+      "!": 7,
+      "\"": 8,
+      "$": 9,
+      "&": 10,
+      "'": 11,
+      "(": 12,
+      ")": 13,
+      "*": 14,
+      "+": 15,
+      ",": 16,
+      "-": 17,
+      ".": 18,
+      "/": 19,
+      "0": 20,
+      "1": 21,
+      "2": 22,
+      "3": 23,
+      "4": 24,
+      "5": 25,
+      "6": 26,
+      "7": 27,
+      "8": 28,
+      "9": 29,
+      ":": 30,
+      ";": 31,
+      "?": 32,
+      "A": 33,
+      "B": 34,
+      "C": 35,
+      "D": 36,
+      "E": 37,
+      "F": 38,
+      "G": 39,
+      "H": 40,
+      "I": 41,
+      "J": 42,
+      "K": 43,
+      "L": 44,
+      "M": 45,
+      "N": 46,
+      "O": 47,
+      "P": 48,
+      "Q": 49,
+      "R": 50,
+      "S": 51,
+      "T": 52,
+      "U": 53,
+      "V": 54,
+      "W": 55,
+      "X": 56,
+      "Y": 57,
+      "Z": 58,
+      "a": 59,
+      "b": 60,
+      "c": 61,
+      "d": 62,
+      "e": 63,
+      "f": 64,
+      "g": 65,
+      "h": 66,
+      "i": 67,
+      "j": 68,
+      "k": 69,
+      "l": 70,
+      "m": 71,
+      "n": 72,
+      "o": 73,
+      "p": 74,
+      "q": 75,
+      "r": 76,
+      "s": 77,
+      "t": 78,
+      "u": 79,
+      "v": 80,
+      "w": 81,
+      "x": 82,
+      "y": 83,
+      "z": 84,
+      " ": 85,
+      "¡": 86,
+      "¦": 87,
+      "©": 88,
+      "«": 89,
+      "±": 90,
+      "³": 91,
+      "´": 92,
+      "»": 93,
+      "Â": 94,
+      "Ã": 95,
+      "â": 96,
+      "œ": 97,
+      "˜": 98,
+      "“": 99,
+      "”": 100,
+      "‰": 101,
+      "€": 102,
+      "™": 103
+    },
+    "unk_token": "<UNK>"
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "tokenizer_class": "CharTokenizer",
+  "model_type": "VerySmollGPT",
+  "vocab_size": 104,
+  "clean_up_tokenization_spaces": true,
+  "bos_token": "<BOS>",
+  "eos_token": "<EOS>",
+  "unk_token": "<UNK>",
+  "pad_token": "<PAD>",
+  "add_prefix_space": false,
+  "add_bos_token": false,
+  "add_eos_token": false
+}