Add 2MB val text sample for Gemma/HF tokenizer notebooks 588ff7f verified LisaMegaWatts commited on 2 days ago
Add 20MB raw text sample for Gemma/HF tokenizer notebooks d9422a5 verified LisaMegaWatts commited on 2 days ago
Add curated training tokens (266M tokens, Chinchilla-optimal) ab3f5b8 verified LisaMegaWatts commited on 3 days ago