Add 2MB val text sample for Gemma/HF tokenizer notebooks 588ff7f verified LisaMegaWatts commited on Feb 28
Add 20MB raw text sample for Gemma/HF tokenizer notebooks d9422a5 verified LisaMegaWatts commited on Feb 28
Add curated training tokens (266M tokens, Chinchilla-optimal) ab3f5b8 verified LisaMegaWatts commited on Feb 27