SQCU commited on
Commit
fd3ca39
·
verified ·
1 Parent(s): 5e8f667

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - roneneldan/TinyStories
4
+ ---
5
+ Some very small and very simple models.
6
+ 29,960,200 parameters.
7
+ "dim":256,"dim_head":32,"headcount":8,"ff_mult":4,
8
+ "vocab_size":50304, "num_layers":4.
9
+ this is nonstandard (for tinystories),
10
+ reflecting a full gpt-2 vocabulary size (bloating the embedding layers),
11
+ and the use of a swiglu activation function,
12
+ (which doubles the width of one of the feedforward layers).
13
+
14
+ training, inference, dataset preparation, and network definitions source available at
15
+ https://github.com/SQCU/attn_demo
16
+
17
+ training logs
18
+ (unprocessed! unfiltered! it's a bunch of log prints of train and validation loss!)
19
+ and training loader source for each run included with the demo models.