dolma2-tokenizer / README.md
soldni's picture
Update README.md
dd6339e verified
---
library_name: transformers
tags: []
---
Slightly modified version of `cl100k_base` that supports Dolma 1.x special tokens
(`|||PHONE_NUMBER|||`, `|||EMAIL_ADDRESS|||`, `|||IP_ADDRESS|||`) as well as adds
extra tokens to fill gaps in tiktoken `cl100k_base` version.