jonmabe
/

tiny-llm-54m

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- tiny
+- from-scratch
+- educational
+- causal-lm
+- personal-llm
+model-index:
+- name: tiny-llm-54m
+  results: []
+---
+# Tiny-LLM 54M
+A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes.
+## Model Description
+This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets.
+### Architecture
+| Component | Value |
+|-----------|-------|
+| Parameters | **54.93M** |
+| Layers | 12 |
+| Hidden Size | 512 |
+| Attention Heads | 8 |
+| Intermediate (FFN) | 1408 |
+| Vocab Size | 32,000 |
+| Max Sequence Length | 512 |
+| Position Encoding | RoPE |
+| Normalization | RMSNorm |
+| Activation | SwiGLU |
+| Weight Tying | Yes |
+### Training Details
+| Parameter | Value |
+|-----------|-------|
+| Training Steps | 50,000 |
+| Tokens | ~100M |
+| Batch Size | 32 |
+| Learning Rate | 3e-4 |
+| Warmup Steps | 2,000 |
+| Weight Decay | 0.1 |
+| Hardware | NVIDIA RTX 5090 (32GB) |
+| Training Time | ~3 hours |
+## Usage
+```python
+import torch
+from transformers import AutoTokenizer
+# Load tokenizer (uses standard GPT-2 style tokenizer)
+tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m")
+# For custom model loading, see the model files
+# This model uses a custom architecture - see scripts/ for inference code
+```
+### Generation Example
+```python
+# Note: This model uses a custom architecture
+# Full inference code available in the repository
+prompt = "The history of artificial intelligence"
+# Model generates continuation based on learned Wikipedia patterns
+```
+## Intended Use
+- **Educational**: Understanding transformer training from scratch
+- **Experimental**: Testing fine-tuning approaches on small models
+- **Personal LLM**: Base for personal voice/style fine-tuning
+- **Research**: Lightweight model for NLP experiments
+## Limitations
+- Small model size limits knowledge and capabilities
+- Trained only on Wikipedia - limited domain coverage
+- Not suitable for production use cases requiring high quality
+- May generate factually incorrect information
+- No RLHF or instruction tuning
+## Training Data
+- **Source**: Wikipedia (English)
+- **Processing**: Tokenized with 32K vocabulary SentencePiece tokenizer
+- **Format**: Standard causal language modeling (next token prediction)
+## Future Work
+This model is intended as a base for:
+1. **Personal Fine-tuning**: Adapt to individual writing style using personal data
+2. **Domain Adaptation**: Specialize for specific topics or tasks
+3. **Instruction Tuning**: Add instruction-following capabilities
+## Hardware Requirements
+- **Inference**: ~300MB GPU memory, runs on any modern GPU or Apple Silicon
+- **Fine-tuning**: ~2GB GPU memory recommended
+## Related Work
+Inspired by:
+- Andrej Karpathy's nanoGPT
+- Geddy Duke's small LLM experiments
+- LLaMA architecture design choices
+## Citation
+```bibtex
+@misc{tiny-llm-54m,
+  author = {jonmabe},
+  title = {Tiny-LLM: A 54M Parameter Language Model},
+  year = {2026},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/jonmabe/tiny-llm-54m}
+}
+```