File size: 3,238 Bytes
81f2447 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- tiny
- from-scratch
- educational
- causal-lm
- personal-llm
model-index:
- name: tiny-llm-54m
results: []
---
# Tiny-LLM 54M
A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes.
## Model Description
This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets.
### Architecture
| Component | Value |
|-----------|-------|
| Parameters | **54.93M** |
| Layers | 12 |
| Hidden Size | 512 |
| Attention Heads | 8 |
| Intermediate (FFN) | 1408 |
| Vocab Size | 32,000 |
| Max Sequence Length | 512 |
| Position Encoding | RoPE |
| Normalization | RMSNorm |
| Activation | SwiGLU |
| Weight Tying | Yes |
### Training Details
| Parameter | Value |
|-----------|-------|
| Training Steps | 50,000 |
| Tokens | ~100M |
| Batch Size | 32 |
| Learning Rate | 3e-4 |
| Warmup Steps | 2,000 |
| Weight Decay | 0.1 |
| Hardware | NVIDIA RTX 5090 (32GB) |
| Training Time | ~3 hours |
## Usage
```python
import torch
from transformers import AutoTokenizer
# Load tokenizer (uses standard GPT-2 style tokenizer)
tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m")
# For custom model loading, see the model files
# This model uses a custom architecture - see scripts/ for inference code
```
### Generation Example
```python
# Note: This model uses a custom architecture
# Full inference code available in the repository
prompt = "The history of artificial intelligence"
# Model generates continuation based on learned Wikipedia patterns
```
## Intended Use
- **Educational**: Understanding transformer training from scratch
- **Experimental**: Testing fine-tuning approaches on small models
- **Personal LLM**: Base for personal voice/style fine-tuning
- **Research**: Lightweight model for NLP experiments
## Limitations
- Small model size limits knowledge and capabilities
- Trained only on Wikipedia - limited domain coverage
- Not suitable for production use cases requiring high quality
- May generate factually incorrect information
- No RLHF or instruction tuning
## Training Data
- **Source**: Wikipedia (English)
- **Processing**: Tokenized with 32K vocabulary SentencePiece tokenizer
- **Format**: Standard causal language modeling (next token prediction)
## Future Work
This model is intended as a base for:
1. **Personal Fine-tuning**: Adapt to individual writing style using personal data
2. **Domain Adaptation**: Specialize for specific topics or tasks
3. **Instruction Tuning**: Add instruction-following capabilities
## Hardware Requirements
- **Inference**: ~300MB GPU memory, runs on any modern GPU or Apple Silicon
- **Fine-tuning**: ~2GB GPU memory recommended
## Related Work
Inspired by:
- Andrej Karpathy's nanoGPT
- Geddy Duke's small LLM experiments
- LLaMA architecture design choices
## Citation
```bibtex
@misc{tiny-llm-54m,
author = {jonmabe},
title = {Tiny-LLM: A 54M Parameter Language Model},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/jonmabe/tiny-llm-54m}
}
```
|