Add BPE tokenizer (needed for Colab training notebooks) 215b74b verified LisaMegaWatts commited on Mar 3
Add 2MB val text sample for Gemma/HF tokenizer notebooks 588ff7f verified LisaMegaWatts commited on Feb 28
Add 20MB raw text sample for Gemma/HF tokenizer notebooks d9422a5 verified LisaMegaWatts commited on Feb 28
Distillation test winner (scratch, PPL=43.9, 5M params) 49d0c4c verified LisaMegaWatts commited on Feb 27
Upload SymbioGPT-10M teacher (val_ppl=35.3, 13400 steps, A100) 06b4943 verified LisaMegaWatts commited on Feb 27
Add curated training tokens (266M tokens, Chinchilla-optimal) ab3f5b8 verified LisaMegaWatts commited on Feb 27