File size: 3,238 Bytes

81f2447

---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- tiny
- from-scratch
- educational
- causal-lm
- personal-llm
model-index:
- name: tiny-llm-54m
  results: []
---

# Tiny-LLM 54M

A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes.

## Model Description

This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets.

### Architecture

| Component | Value |
|-----------|-------|
| Parameters | **54.93M** |
| Layers | 12 |
| Hidden Size | 512 |
| Attention Heads | 8 |
| Intermediate (FFN) | 1408 |
| Vocab Size | 32,000 |
| Max Sequence Length | 512 |
| Position Encoding | RoPE |
| Normalization | RMSNorm |
| Activation | SwiGLU |
| Weight Tying | Yes |

### Training Details

| Parameter | Value |
|-----------|-------|
| Training Steps | 50,000 |
| Tokens | ~100M |
| Batch Size | 32 |
| Learning Rate | 3e-4 |
| Warmup Steps | 2,000 |
| Weight Decay | 0.1 |
| Hardware | NVIDIA RTX 5090 (32GB) |
| Training Time | ~3 hours |

## Usage

```python
import torch
from transformers import AutoTokenizer

# Load tokenizer (uses standard GPT-2 style tokenizer)
tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m")

# For custom model loading, see the model files
# This model uses a custom architecture - see scripts/ for inference code
```

### Generation Example

```python
# Note: This model uses a custom architecture
# Full inference code available in the repository

prompt = "The history of artificial intelligence"
# Model generates continuation based on learned Wikipedia patterns
```

## Intended Use

- **Educational**: Understanding transformer training from scratch
- **Experimental**: Testing fine-tuning approaches on small models
- **Personal LLM**: Base for personal voice/style fine-tuning
- **Research**: Lightweight model for NLP experiments

## Limitations

- Small model size limits knowledge and capabilities
- Trained only on Wikipedia - limited domain coverage
- Not suitable for production use cases requiring high quality
- May generate factually incorrect information
- No RLHF or instruction tuning

## Training Data

- **Source**: Wikipedia (English)
- **Processing**: Tokenized with 32K vocabulary SentencePiece tokenizer
- **Format**: Standard causal language modeling (next token prediction)

## Future Work

This model is intended as a base for:
1. **Personal Fine-tuning**: Adapt to individual writing style using personal data
2. **Domain Adaptation**: Specialize for specific topics or tasks
3. **Instruction Tuning**: Add instruction-following capabilities

## Hardware Requirements

- **Inference**: ~300MB GPU memory, runs on any modern GPU or Apple Silicon
- **Fine-tuning**: ~2GB GPU memory recommended

## Related Work

Inspired by:
- Andrej Karpathy's nanoGPT
- Geddy Duke's small LLM experiments
- LLaMA architecture design choices

## Citation

```bibtex
@misc{tiny-llm-54m,
  author = {jonmabe},
  title = {Tiny-LLM: A 54M Parameter Language Model},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/jonmabe/tiny-llm-54m}
}
```