|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- tiny |
|
|
- from-scratch |
|
|
- educational |
|
|
- causal-lm |
|
|
- personal-llm |
|
|
model-index: |
|
|
- name: tiny-llm-54m |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# Tiny-LLM 54M |
|
|
|
|
|
A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets. |
|
|
|
|
|
### Architecture |
|
|
|
|
|
| Component | Value | |
|
|
|-----------|-------| |
|
|
| Parameters | **54.93M** | |
|
|
| Layers | 12 | |
|
|
| Hidden Size | 512 | |
|
|
| Attention Heads | 8 | |
|
|
| Intermediate (FFN) | 1408 | |
|
|
| Vocab Size | 32,000 | |
|
|
| Max Sequence Length | 512 | |
|
|
| Position Encoding | RoPE | |
|
|
| Normalization | RMSNorm | |
|
|
| Activation | SwiGLU | |
|
|
| Weight Tying | Yes | |
|
|
|
|
|
### Training Details |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Training Steps | 50,000 | |
|
|
| Tokens | ~100M | |
|
|
| Batch Size | 32 | |
|
|
| Learning Rate | 3e-4 | |
|
|
| Warmup Steps | 2,000 | |
|
|
| Weight Decay | 0.1 | |
|
|
| Hardware | NVIDIA RTX 5090 (32GB) | |
|
|
| Training Time | ~3 hours | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
# Load tokenizer (uses standard GPT-2 style tokenizer) |
|
|
tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m") |
|
|
|
|
|
# For custom model loading, see the model files |
|
|
# This model uses a custom architecture - see scripts/ for inference code |
|
|
``` |
|
|
|
|
|
### Generation Example |
|
|
|
|
|
```python |
|
|
# Note: This model uses a custom architecture |
|
|
# Full inference code available in the repository |
|
|
|
|
|
prompt = "The history of artificial intelligence" |
|
|
# Model generates continuation based on learned Wikipedia patterns |
|
|
``` |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
- **Educational**: Understanding transformer training from scratch |
|
|
- **Experimental**: Testing fine-tuning approaches on small models |
|
|
- **Personal LLM**: Base for personal voice/style fine-tuning |
|
|
- **Research**: Lightweight model for NLP experiments |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Small model size limits knowledge and capabilities |
|
|
- Trained only on Wikipedia - limited domain coverage |
|
|
- Not suitable for production use cases requiring high quality |
|
|
- May generate factually incorrect information |
|
|
- No RLHF or instruction tuning |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- **Source**: Wikipedia (English) |
|
|
- **Processing**: Tokenized with 32K vocabulary SentencePiece tokenizer |
|
|
- **Format**: Standard causal language modeling (next token prediction) |
|
|
|
|
|
## Future Work |
|
|
|
|
|
This model is intended as a base for: |
|
|
1. **Personal Fine-tuning**: Adapt to individual writing style using personal data |
|
|
2. **Domain Adaptation**: Specialize for specific topics or tasks |
|
|
3. **Instruction Tuning**: Add instruction-following capabilities |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
- **Inference**: ~300MB GPU memory, runs on any modern GPU or Apple Silicon |
|
|
- **Fine-tuning**: ~2GB GPU memory recommended |
|
|
|
|
|
## Related Work |
|
|
|
|
|
Inspired by: |
|
|
- Andrej Karpathy's nanoGPT |
|
|
- Geddy Duke's small LLM experiments |
|
|
- LLaMA architecture design choices |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{tiny-llm-54m, |
|
|
author = {jonmabe}, |
|
|
title = {Tiny-LLM: A 54M Parameter Language Model}, |
|
|
year = {2026}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/jonmabe/tiny-llm-54m} |
|
|
} |
|
|
``` |
|
|
|