--- tags: - tokie library_name: tokie ---

tokie

# DeepSeek-R1 Pre-built [tokie](https://github.com/chonkie-inc/tokie) tokenizer for [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1). ## Quick Start (Python) ```bash pip install tokie ``` ```python import tokie tokenizer = tokie.Tokenizer.from_pretrained("tokiers/DeepSeek-R1") encoding = tokenizer.encode("Hello, world!") print(encoding.ids) print(encoding.attention_mask) ``` ## Quick Start (Rust) ```toml [dependencies] tokie = { version = "0.0.7", features = ["hf"] } ``` ```rust use tokie::Tokenizer; let tokenizer = Tokenizer::from_pretrained("tokiers/DeepSeek-R1").unwrap(); let encoding = tokenizer.encode("Hello, world!", true); println!("{:?}", encoding.ids); ``` ## Files - `tokenizer.tkz` — tokie binary format (~10x smaller, loads in ~5ms) - `tokenizer.json` — original HuggingFace tokenizer ## About tokie **50x faster tokenization, 10x smaller model files, 100% accurate.** tokie is a drop-in replacement for HuggingFace tokenizers, built in Rust. See [GitHub](https://github.com/chonkie-inc/tokie) for benchmarks and documentation. ## License MIT OR Apache-2.0 (tokie library). Original model files retain their original license from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).