File size: 1,393 Bytes
164b643
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
tags:
- tokie
library_name: tokie
---

<p align="center">
  <img src="tokie-banner.png" alt="tokie" width="600">
</p>

# DeepSeek-R1

Pre-built [tokie](https://github.com/chonkie-inc/tokie) tokenizer for [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).

## Quick Start (Python)

```bash
pip install tokie
```

```python
import tokie

tokenizer = tokie.Tokenizer.from_pretrained("tokiers/DeepSeek-R1")
encoding = tokenizer.encode("Hello, world!")
print(encoding.ids)
print(encoding.attention_mask)
```

## Quick Start (Rust)

```toml
[dependencies]
tokie = { version = "0.0.7", features = ["hf"] }
```

```rust
use tokie::Tokenizer;

let tokenizer = Tokenizer::from_pretrained("tokiers/DeepSeek-R1").unwrap();
let encoding = tokenizer.encode("Hello, world!", true);
println!("{:?}", encoding.ids);
```

## Files

- `tokenizer.tkz` — tokie binary format (~10x smaller, loads in ~5ms)
- `tokenizer.json` — original HuggingFace tokenizer

## About tokie

**50x faster tokenization, 10x smaller model files, 100% accurate.**

tokie is a drop-in replacement for HuggingFace tokenizers, built in Rust. See [GitHub](https://github.com/chonkie-inc/tokie) for benchmarks and documentation.

## License

MIT OR Apache-2.0 (tokie library). Original model files retain their original license from [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).