File size: 1,143 Bytes
80765dc
3aa6f7b
 
80765dc
3aa6f7b
80765dc
3aa6f7b
53dd03d
80765dc
 
3aa6f7b
80765dc
 
3aa6f7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
title: Tiny-LLM Text Generator
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
python_version: "3.11"
app_file: app.py
pinned: false
license: apache-2.0
---

# Tiny-LLM Text Generator

A **54 million parameter** language model trained **from scratch** on Wikipedia.

## About

This demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets!

## Architecture

| Component | Value |
|-----------|-------|
| Parameters | 54.93M |
| Layers | 12 |
| Hidden Size | 512 |
| Attention Heads | 8 |
| Intermediate (FFN) | 1408 |
| Vocab Size | 32,000 |
| Max Sequence Length | 512 |
| Position Encoding | RoPE |
| Normalization | RMSNorm |
| Activation | SwiGLU |

## Training

- **Training Steps**: 50,000
- **Tokens**: ~100M
- **Hardware**: NVIDIA RTX 5090 (32GB)
- **Training Time**: ~3 hours

## Model

[jonmabe/tiny-llm-54m](https://huggingface.co/jonmabe/tiny-llm-54m)

## Limitations

- Small model size limits knowledge and capabilities
- Trained only on Wikipedia - limited domain coverage
- May generate factually incorrect information
- Not instruction-tuned