ayayayya

Update README.md

2aa3a7a verified about 1 month ago

5.35 kB

	---
	language:
	- en
	license: mit
	tags:
	- text-generation
	- character-level
	- tiny-stories
	- raspberry-pi
	- gpt
	- decoder-only
	datasets:
	- roneneldan/TinyStories
	metrics:
	- perplexity
	model-index:
	- name: VerySmollGPT
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TinyStories
	type: roneneldan/TinyStories
	metrics:
	- type: loss
	value: 0.6777
	name: Training Loss (Final)
	verified: false
	- type: loss
	value: 0.7028
	name: Validation Loss (Final)
	verified: false
	- type: loss
	value: 0.6924
	name: Validation Loss (Best)
	verified: false
	---

	# VerySmollGPT

	A lightweight character-level GPT model trained entirely on a Raspberry Pi 5. This model demonstrates that capable language models can be trained on consumer hardware with limited resources.

	## Model Description

	VerySmollGPT is a decoder-only transformer model (GPT-style architecture) designed for character-level text generation. It was trained on the TinyStories dataset to generate coherent short stories.

	- Developed by: Kittykat924
	- Model type: Decoder-only Transformer (GPT)
	- Language: English
	- License: MIT
	- Trained on: Raspberry Pi 5 (CPU only)
	- Training duration: ~9 days
	- Parameters: 4.80M (unique), 4.83M (with weight tying)

	## Model Architecture

	\| Component \| Value \|
	\|-----------\|-------\|
	\| Vocabulary Size \| 104 characters \|
	\| Embedding Dimension \| 256 \|
	\| Layers \| 6 \|
	\| Attention Heads \| 8 \|
	\| Feed-forward Dimension \| 1024 \|
	\| Context Window \| 128 tokens \|
	\| Dropout \| 0.1 \|
	\| Weight Tying \| Yes (token embeddings ↔ output layer) \|

	## Training Details

	### Training Data

	- Dataset: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
	- Dataset Size: ~25MB (optimized for Raspberry Pi)
	- Total Tokens: ~25M characters
	- Train/Val Split: 90/10

	### Training Procedure

	Hardware:
	- Raspberry Pi 5
	- CPU-only training (no GPU)
	- Training time: ~9 days

	Hyperparameters:
	- Epochs: 3
	- Batch Size: 16
	- Learning Rate: 3e-4 (initial)
	- Min Learning Rate: 1e-4 (cosine annealing)
	- Optimizer: AdamW (β₁=0.9, β₂=0.95)
	- Weight Decay: 0.01
	- Gradient Clipping: 1.0
	- Max Batches per Epoch: 130,000
	- Context Window: 128 tokens

	Training Stats:
	- Final Epoch: 2 (checkpoint from epoch 3)
	- Global Steps: 390,000
	- Best Validation Loss: 0.692

	### Tokenization

	Character-level tokenization with 104 unique tokens:
	- 100 regular characters (letters, numbers, punctuation, special characters)
	- 4 special tokens: `<PAD>`, `<UNK>`, `<BOS>`, `<EOS>`

	## Usage

	### Installation

	```bash
	pip install torch safetensors
	```

	### Loading the Model

	```python
	from safetensors.torch import load_file
	import torch
	import torch.nn as nn

	# Load model weights
	state_dict = load_file('model.safetensors')

	# Load configuration
	import json
	with open('config.json', 'r') as f:
	config = json.load(f)

	# Note: You'll need to implement the VerySmollGPT architecture
	# or use the original model.py from the repository
	```

	### Text Generation Example

	```python
	# Assuming you have the model loaded
	model.eval()

	# Encode your prompt (character-level)
	prompt = "Once upon a time"
	input_ids = [char_to_idx[c] for c in prompt]
	input_tensor = torch.tensor([input_ids], dtype=torch.long)

	# Generate
	with torch.no_grad():
	output_ids = model.generate(
	input_tensor,
	max_new_tokens=200,
	temperature=0.8,
	top_k=40
	)

	# Decode output
	generated_text = ''.join([idx_to_char[i] for i in output_ids[0].tolist()])
	print(generated_text)
	```

	## Example Outputs

	Prompt: "Once upon a time"

	Generated:
	> Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite was a penguin that had a shiny metal box on it. Timmy liked to...

	Prompt: "The quick brown fox"

	Generated:
	> The quick brown fox wanted to play with him again. The fox said he was not fair anymore. He said he was sorry and that he learned his lesson...

	## Limitations and Bias

	- Character-level tokenization: Less efficient than BPE/WordPiece for longer texts
	- Small context window: 128 tokens limits long-range dependencies
	- Training data: Limited to TinyStories dataset style (simple children's stories)
	- Vocabulary: Only 104 characters, may not handle all Unicode characters
	- Coherence: Best for short-form text generation (stories, snippets)

	## Environmental Impact

	This model was intentionally trained on a Raspberry Pi 5 to demonstrate low-power AI training:

	- Hardware: Raspberry Pi 5 (CPU only, ~15W power consumption)
	- Training Duration: ~9 days
	- Estimated Energy: ~3.24 kWh total
	- Carbon Footprint: Minimal compared to GPU-based training

	## Technical Specifications

	- Model Size: 19 MB (safetensors format)
	- Inference Memory: ~200-300 MB RAM
	- Training Memory: ~1-2 GB RAM (batch_size=16)
	- Precision: FP32


	## Acknowledgments

	- Architecture inspired by [Andrej Karpathy's nanoGPT](https://github.com/karpathy/nanoGPT)
	- Dataset: [TinyStories by Ronen Eldan and Yuanzhi Li](https://huggingface.co/datasets/roneneldan/TinyStories)
	- Trained on Raspberry Pi 5 to demonstrate accessible AI training


	[Github](https://github.com/Igidn/VerySmollGPT)