- Qwen-Qwen3-8B-textmatched
- common-pile-comma-v0.1-textmatched
- fineweb2_hq_superset_lang_tokenizers
- fineweb2_hq_superset_oracle
- fineweb2_hq_superset_oracle_09
- flexitok_llama
- flexitok_llama_albert
- flexitok_llama_bpe_dropout
- flexitok_subword_regularization
- flexitok_superset_albert
- flexitok_superset_albert_w_xglm
- google-gemma-2-2b-textmatched
- gpt2
- gpt4o
- llama_43k
- meta-llama-Llama-3.2-1B-textmatched
- meta-llama-Llama-3.2-300M
- meta-llama-Llama-3.2-7B
- 10.6 kB