File size: 3,238 Bytes
81f2447
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- tiny
- from-scratch
- educational
- causal-lm
- personal-llm
model-index:
- name: tiny-llm-54m
  results: []
---

# Tiny-LLM 54M

A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes.

## Model Description

This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets.

### Architecture

| Component | Value |
|-----------|-------|
| Parameters | **54.93M** |
| Layers | 12 |
| Hidden Size | 512 |
| Attention Heads | 8 |
| Intermediate (FFN) | 1408 |
| Vocab Size | 32,000 |
| Max Sequence Length | 512 |
| Position Encoding | RoPE |
| Normalization | RMSNorm |
| Activation | SwiGLU |
| Weight Tying | Yes |

### Training Details

| Parameter | Value |
|-----------|-------|
| Training Steps | 50,000 |
| Tokens | ~100M |
| Batch Size | 32 |
| Learning Rate | 3e-4 |
| Warmup Steps | 2,000 |
| Weight Decay | 0.1 |
| Hardware | NVIDIA RTX 5090 (32GB) |
| Training Time | ~3 hours |

## Usage

```python
import torch
from transformers import AutoTokenizer

# Load tokenizer (uses standard GPT-2 style tokenizer)
tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m")

# For custom model loading, see the model files
# This model uses a custom architecture - see scripts/ for inference code
```

### Generation Example

```python
# Note: This model uses a custom architecture
# Full inference code available in the repository

prompt = "The history of artificial intelligence"
# Model generates continuation based on learned Wikipedia patterns
```

## Intended Use

- **Educational**: Understanding transformer training from scratch
- **Experimental**: Testing fine-tuning approaches on small models
- **Personal LLM**: Base for personal voice/style fine-tuning
- **Research**: Lightweight model for NLP experiments

## Limitations

- Small model size limits knowledge and capabilities
- Trained only on Wikipedia - limited domain coverage
- Not suitable for production use cases requiring high quality
- May generate factually incorrect information
- No RLHF or instruction tuning

## Training Data

- **Source**: Wikipedia (English)
- **Processing**: Tokenized with 32K vocabulary SentencePiece tokenizer
- **Format**: Standard causal language modeling (next token prediction)

## Future Work

This model is intended as a base for:
1. **Personal Fine-tuning**: Adapt to individual writing style using personal data
2. **Domain Adaptation**: Specialize for specific topics or tasks
3. **Instruction Tuning**: Add instruction-following capabilities

## Hardware Requirements

- **Inference**: ~300MB GPU memory, runs on any modern GPU or Apple Silicon
- **Fine-tuning**: ~2GB GPU memory recommended

## Related Work

Inspired by:
- Andrej Karpathy's nanoGPT
- Geddy Duke's small LLM experiments
- LLaMA architecture design choices

## Citation

```bibtex
@misc{tiny-llm-54m,
  author = {jonmabe},
  title = {Tiny-LLM: A 54M Parameter Language Model},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/jonmabe/tiny-llm-54m}
}
```