tiny-llm-54m / README.md
jonmabe's picture
Update model card with documentation and examples
81f2447 verified
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- tiny
- from-scratch
- educational
- causal-lm
- personal-llm
model-index:
- name: tiny-llm-54m
results: []
---
# Tiny-LLM 54M
A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes.
## Model Description
This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets.
### Architecture
| Component | Value |
|-----------|-------|
| Parameters | **54.93M** |
| Layers | 12 |
| Hidden Size | 512 |
| Attention Heads | 8 |
| Intermediate (FFN) | 1408 |
| Vocab Size | 32,000 |
| Max Sequence Length | 512 |
| Position Encoding | RoPE |
| Normalization | RMSNorm |
| Activation | SwiGLU |
| Weight Tying | Yes |
### Training Details
| Parameter | Value |
|-----------|-------|
| Training Steps | 50,000 |
| Tokens | ~100M |
| Batch Size | 32 |
| Learning Rate | 3e-4 |
| Warmup Steps | 2,000 |
| Weight Decay | 0.1 |
| Hardware | NVIDIA RTX 5090 (32GB) |
| Training Time | ~3 hours |
## Usage
```python
import torch
from transformers import AutoTokenizer
# Load tokenizer (uses standard GPT-2 style tokenizer)
tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m")
# For custom model loading, see the model files
# This model uses a custom architecture - see scripts/ for inference code
```
### Generation Example
```python
# Note: This model uses a custom architecture
# Full inference code available in the repository
prompt = "The history of artificial intelligence"
# Model generates continuation based on learned Wikipedia patterns
```
## Intended Use
- **Educational**: Understanding transformer training from scratch
- **Experimental**: Testing fine-tuning approaches on small models
- **Personal LLM**: Base for personal voice/style fine-tuning
- **Research**: Lightweight model for NLP experiments
## Limitations
- Small model size limits knowledge and capabilities
- Trained only on Wikipedia - limited domain coverage
- Not suitable for production use cases requiring high quality
- May generate factually incorrect information
- No RLHF or instruction tuning
## Training Data
- **Source**: Wikipedia (English)
- **Processing**: Tokenized with 32K vocabulary SentencePiece tokenizer
- **Format**: Standard causal language modeling (next token prediction)
## Future Work
This model is intended as a base for:
1. **Personal Fine-tuning**: Adapt to individual writing style using personal data
2. **Domain Adaptation**: Specialize for specific topics or tasks
3. **Instruction Tuning**: Add instruction-following capabilities
## Hardware Requirements
- **Inference**: ~300MB GPU memory, runs on any modern GPU or Apple Silicon
- **Fine-tuning**: ~2GB GPU memory recommended
## Related Work
Inspired by:
- Andrej Karpathy's nanoGPT
- Geddy Duke's small LLM experiments
- LLaMA architecture design choices
## Citation
```bibtex
@misc{tiny-llm-54m,
author = {jonmabe},
title = {Tiny-LLM: A 54M Parameter Language Model},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/jonmabe/tiny-llm-54m}
}
```