| # TinyStack Tokenizer | |
| ByteLevel BPE tokenizer trained on fhswf/tiny-stack dataset. | |
| ## Usage | |
| ```python | |
| from tokenizers.implementations import ByteLevelBPETokenizer | |
| from tokenizers.processors import BertProcessing | |
| tokenizer = ByteLevelBPETokenizer("./vocab.json", "./merges.txt") | |
| tokenizer._tokenizer.post_processor = BertProcessing( | |
| ("</s>", tokenizer.token_to_id("</s>")), | |
| ("<s>", tokenizer.token_to_id("<s>")), | |
| ) | |
| tokenizer.enable_truncation(max_length=512) | |
| ``` | |
| Vocab size: 52000 | |