--- license: apache-2.0 language: - en pipeline_tag: text-generation tags: - tiny - from-scratch - educational - causal-lm - personal-llm model-index: - name: tiny-llm-54m results: [] --- # Tiny-LLM 54M A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes. ## Model Description This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets. ### Architecture | Component | Value | |-----------|-------| | Parameters | **54.93M** | | Layers | 12 | | Hidden Size | 512 | | Attention Heads | 8 | | Intermediate (FFN) | 1408 | | Vocab Size | 32,000 | | Max Sequence Length | 512 | | Position Encoding | RoPE | | Normalization | RMSNorm | | Activation | SwiGLU | | Weight Tying | Yes | ### Training Details | Parameter | Value | |-----------|-------| | Training Steps | 50,000 | | Tokens | ~100M | | Batch Size | 32 | | Learning Rate | 3e-4 | | Warmup Steps | 2,000 | | Weight Decay | 0.1 | | Hardware | NVIDIA RTX 5090 (32GB) | | Training Time | ~3 hours | ## Usage ```python import torch from transformers import AutoTokenizer # Load tokenizer (uses standard GPT-2 style tokenizer) tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m") # For custom model loading, see the model files # This model uses a custom architecture - see scripts/ for inference code ``` ### Generation Example ```python # Note: This model uses a custom architecture # Full inference code available in the repository prompt = "The history of artificial intelligence" # Model generates continuation based on learned Wikipedia patterns ``` ## Intended Use - **Educational**: Understanding transformer training from scratch - **Experimental**: Testing fine-tuning approaches on small models - **Personal LLM**: Base for personal voice/style fine-tuning - **Research**: Lightweight model for NLP experiments ## Limitations - Small model size limits knowledge and capabilities - Trained only on Wikipedia - limited domain coverage - Not suitable for production use cases requiring high quality - May generate factually incorrect information - No RLHF or instruction tuning ## Training Data - **Source**: Wikipedia (English) - **Processing**: Tokenized with 32K vocabulary SentencePiece tokenizer - **Format**: Standard causal language modeling (next token prediction) ## Future Work This model is intended as a base for: 1. **Personal Fine-tuning**: Adapt to individual writing style using personal data 2. **Domain Adaptation**: Specialize for specific topics or tasks 3. **Instruction Tuning**: Add instruction-following capabilities ## Hardware Requirements - **Inference**: ~300MB GPU memory, runs on any modern GPU or Apple Silicon - **Fine-tuning**: ~2GB GPU memory recommended ## Related Work Inspired by: - Andrej Karpathy's nanoGPT - Geddy Duke's small LLM experiments - LLaMA architecture design choices ## Citation ```bibtex @misc{tiny-llm-54m, author = {jonmabe}, title = {Tiny-LLM: A 54M Parameter Language Model}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/jonmabe/tiny-llm-54m} } ```