File size: 1,393 Bytes
164b643 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ---
tags:
- tokie
library_name: tokie
---
<p align="center">
<img src="tokie-banner.png" alt="tokie" width="600">
</p>
# DeepSeek-R1
Pre-built [tokie](https://github.com/chonkie-inc/tokie) tokenizer for [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).
## Quick Start (Python)
```bash
pip install tokie
```
```python
import tokie
tokenizer = tokie.Tokenizer.from_pretrained("tokiers/DeepSeek-R1")
encoding = tokenizer.encode("Hello, world!")
print(encoding.ids)
print(encoding.attention_mask)
```
## Quick Start (Rust)
```toml
[dependencies]
tokie = { version = "0.0.7", features = ["hf"] }
```
```rust
use tokie::Tokenizer;
let tokenizer = Tokenizer::from_pretrained("tokiers/DeepSeek-R1").unwrap();
let encoding = tokenizer.encode("Hello, world!", true);
println!("{:?}", encoding.ids);
```
## Files
- `tokenizer.tkz` — tokie binary format (~10x smaller, loads in ~5ms)
- `tokenizer.json` — original HuggingFace tokenizer
## About tokie
**50x faster tokenization, 10x smaller model files, 100% accurate.**
tokie is a drop-in replacement for HuggingFace tokenizers, built in Rust. See [GitHub](https://github.com/chonkie-inc/tokie) for benchmarks and documentation.
## License
MIT OR Apache-2.0 (tokie library). Original model files retain their original license from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).
|