File size: 712 Bytes
fd3ca39
 
 
 
 
87045f5
fd3ca39
87045f5
fd3ca39
 
87045f5
fd3ca39
 
 
 
 
87045f5
fd3ca39
 
 
87045f5
fd3ca39
87045f5
fd3ca39
87045f5
fd3ca39
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
datasets:
- roneneldan/TinyStories
---
Some very small and very simple models.

29,960,200 parameters.

"dim":256,"dim_head":32,"headcount":8,"ff_mult":4,
"vocab_size":50304, "num_layers":4.

this is nonstandard (for tinystories), 
reflecting a full gpt-2 vocabulary size (bloating the embedding layers),
and the use of a swiglu activation function, 
(which doubles the width of one of the feedforward layers).


training, inference, dataset preparation, and network definitions source available at 
https://github.com/SQCU/attn_demo


training logs 

(unprocessed! unfiltered! it's a bunch of log prints of train and validation loss!) 

and training loader source for each run included with the demo models.