Commit History

Add 2MB val text sample for Gemma/HF tokenizer notebooks
588ff7f
verified

LisaMegaWatts commited on

Add 20MB raw text sample for Gemma/HF tokenizer notebooks
d9422a5
verified

LisaMegaWatts commited on

Add curated training tokens (266M tokens, Chinchilla-optimal)
ab3f5b8
verified

LisaMegaWatts commited on

Add validation tokens (72M tokens)
ff30a4a
verified

LisaMegaWatts commited on