File size: 1,532 Bytes
ff5d275
 
 
 
cd764fb
 
 
 
 
 
 
 
 
ff5d275
 
 
 
cd764fb
ff5d275
 
 
 
 
cd764fb
ff5d275
 
 
 
 
 
cd764fb
ff5d275
 
 
 
 
 
 
 
 
 
 
cd764fb
ff5d275
 
 
 
 
 
 
cd764fb
ff5d275
 
cd764fb
ff5d275
 
 
 
 
 
 
 
cd764fb
ff5d275
cd764fb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
language: en
license: mit
tags:
- tiny
- language-model
- causal-lm
- pytorch
datasets:
- roneneldan/TinyStories
- Skylion007/openwebtext
pipeline_tag: text-generation
library_name: transformers
---

# TinyLM

A 3.4M parameter causal language model trained from scratch, for experimentation.

## Architecture

| Hyperparameter | Value |
|---|---|
| Parameters | 3.403.968 |
| Layers | 4 |
| Hidden size | 64 |
| Attention heads | 4 |
| FFN dim | 192 |
| Embedding rank | 32 |
| Context length | 256 |
| Tokenizer | GPT-2 (50257 vocab) |

Uses a **factored (low-rank) embedding** to keep the vocab projection from eating the entire parameter budget, with weight tying on the output head.

## Training

| | |
|---|---|
| Datasets | Skylion007/openwebtext (10k samples), roneneldan/TinyStories (10k samples) |
| Optimizer | AdamW (lr=3e-3, weight_decay=0.01) |
| Scheduler | Cosine annealing with warm restarts |
| Mixed precision | fp16 (torch.cuda.amp) |
| Hardware | Nvidia P100 |

## Usage
```python
from huggingface_hub import snapshot_download
import importlib.util
import torch

# Download files
snapshot_download(repo_id="Fu01978/TinyLM", local_dir="./tinylm")

# Load via script
spec   = importlib.util.spec_from_file_location("modeling_tinylm", "./tinylm/modeling_tinylm.py")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)

model, tokenizer, config = module.load_tinylm("./tinylm")
model.eval()

# Generate
output = module.generate(model, tokenizer, "Once upon a time, ")
print(output)
```