Transformers
SmolDoge-tokenizer / README.md
JingzeShi's picture
Update README.md
9be385f verified
|
raw
history blame
327 Bytes
metadata
library_name: transformers
datasets:
  - HuggingFaceTB/smollm-corpus

Doge-tokenizer

Tokenizer for the training model on smollm-corpus. This tokenizer was trained on 2M samples from:

  • FineWeb-Edu 70%
  • Cosmopedia v2 20%
  • Python-Edu 5%
  • FineMath 5%