File size: 2,084 Bytes
30cf512
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
language:
- en
- zh
license: other
license_name: modified-mit
library_name: transformers
tags:
- tokenizer
- tiktoken
- kimi
- moonshot
- deepseek
base_model: moonshotai/Kimi-K2-Thinking
---

# Kimi-K2-Thinking Tokenizer

Tokenizer files for [moonshotai/Kimi-K2-Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking) - a trillion-parameter thinking model.

## 📦 What's Included

- `tiktoken.model` - Original tiktoken tokenizer (2.8MB)
- `tokenizer.json` - HuggingFace compatible format
- `tokenization_kimi.py` - Custom tokenization code
- `tokenizer_config.json` - Configuration
- `special_tokens_map.json` - Special tokens
- `chat_template.jinja` - Chat template

## 🚀 Quick Start

```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "Zaynoid/Kimi-K2-Thinking-Tokenizer",
    trust_remote_code=True
)

text = "Hello, how are you?"
tokens = tokenizer.encode(text)
print(f"Tokens: {tokens}")
print(f"Decoded: {tokenizer.decode(tokens)}")
```

## 📊 Specifications

- **Vocab Size**: 163,840 tokens
- **Model Type**: Tiktoken (BPE-based)
- **Context Length**: 256K tokens
- **Special Tokens**:
  - BOS: `<|begin▁of▁sentence|>` (ID: 163584)
  - EOS: `<|end▁of▁sentence|>` (ID: 163585)

## 💡 Usage Notes

1. **Recommended**: Use with `trust_remote_code=True` for full compatibility
2. The `tokenizer.json` is provided for tools that require it
3. Original model uses tiktoken format natively

## 🔧 Use with vLLM

```python
from vllm import LLM

llm = LLM(
    model="moonshotai/Kimi-K2-Thinking",
    tokenizer="Zaynoid/Kimi-K2-Thinking-Tokenizer",
    trust_remote_code=True
)
```

## 📝 License

Modified MIT License (same as base model)

## 🙏 Credits

- **Original Model**: [Moonshot AI](https://huggingface.co/moonshotai)
- **Architecture**: Based on DeepSeek-V3
- **Tokenizer Extraction**: Zaynoid

## 🔗 Links

- [Base Model](https://huggingface.co/moonshotai/Kimi-K2-Thinking)
- [Model Card](https://huggingface.co/moonshotai/Kimi-K2-Thinking)
- [Moonshot AI](https://www.moonshot.cn/)