| # Arcade100kTokenizer | |
| Arcade100k is a BPE tokenizer extended from OpenAI’s [`tiktoken.cl100k_base`](https://github.com/openai/tiktoken) to | |
| include special tokens for code and individual digit-splitting. | |
| ``` | |
| from transformers import AutoTokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("stabilityai/arcade100k", trust_remote_code=True) | |
| tokenizer("hello, world!", return_tensors='pt') | |
| ``` | |
| # Citation | |
| ```bibtex | |
| @article{bellagente2024stable, | |
| title={Stable LM 2 1.6 B Technical Report}, | |
| author={Bellagente, Marco and Tow, Jonathan and Mahan, Dakota and Phung, Duy and Zhuravinskyi, Maksym and Adithyan, Reshinth and Baicoianu, James and Brooks, Ben and Cooper, Nathan and Datta, Ashish and others}, | |
| journal={arXiv preprint arXiv:2402.17834}, | |
| year={2024} | |
| } | |
| ``` | |