SQCU
/

pgptlformer-tinystories

Model card Files Files and versions

SQCU commited on Jan 26, 2025

Commit

fd3ca39

·

verified ·

1 Parent(s): 5e8f667

Create README.md

Files changed (1) hide show

README.md +19 -0

README.md ADDED Viewed

	@@ -0,0 +1,19 @@

+---
+datasets:
+- roneneldan/TinyStories
+---
+Some very small and very simple models.
+29,960,200 parameters.
+"dim":256,"dim_head":32,"headcount":8,"ff_mult":4,
+"vocab_size":50304, "num_layers":4.
+this is nonstandard (for tinystories),
+reflecting a full gpt-2 vocabulary size (bloating the embedding layers),
+and the use of a swiglu activation function,
+(which doubles the width of one of the feedforward layers).
+training, inference, dataset preparation, and network definitions source available at
+https://github.com/SQCU/attn_demo
+training logs
+(unprocessed! unfiltered! it's a bunch of log prints of train and validation loss!)
+and training loader source for each run included with the demo models.