File size: 2,084 Bytes
30cf512 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | ---
language:
- en
- zh
license: other
license_name: modified-mit
library_name: transformers
tags:
- tokenizer
- tiktoken
- kimi
- moonshot
- deepseek
base_model: moonshotai/Kimi-K2-Thinking
---
# Kimi-K2-Thinking Tokenizer
Tokenizer files for [moonshotai/Kimi-K2-Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking) - a trillion-parameter thinking model.
## 📦 What's Included
- `tiktoken.model` - Original tiktoken tokenizer (2.8MB)
- `tokenizer.json` - HuggingFace compatible format
- `tokenization_kimi.py` - Custom tokenization code
- `tokenizer_config.json` - Configuration
- `special_tokens_map.json` - Special tokens
- `chat_template.jinja` - Chat template
## 🚀 Quick Start
```python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"Zaynoid/Kimi-K2-Thinking-Tokenizer",
trust_remote_code=True
)
text = "Hello, how are you?"
tokens = tokenizer.encode(text)
print(f"Tokens: {tokens}")
print(f"Decoded: {tokenizer.decode(tokens)}")
```
## 📊 Specifications
- **Vocab Size**: 163,840 tokens
- **Model Type**: Tiktoken (BPE-based)
- **Context Length**: 256K tokens
- **Special Tokens**:
- BOS: `<|begin▁of▁sentence|>` (ID: 163584)
- EOS: `<|end▁of▁sentence|>` (ID: 163585)
## 💡 Usage Notes
1. **Recommended**: Use with `trust_remote_code=True` for full compatibility
2. The `tokenizer.json` is provided for tools that require it
3. Original model uses tiktoken format natively
## 🔧 Use with vLLM
```python
from vllm import LLM
llm = LLM(
model="moonshotai/Kimi-K2-Thinking",
tokenizer="Zaynoid/Kimi-K2-Thinking-Tokenizer",
trust_remote_code=True
)
```
## 📝 License
Modified MIT License (same as base model)
## 🙏 Credits
- **Original Model**: [Moonshot AI](https://huggingface.co/moonshotai)
- **Architecture**: Based on DeepSeek-V3
- **Tokenizer Extraction**: Zaynoid
## 🔗 Links
- [Base Model](https://huggingface.co/moonshotai/Kimi-K2-Thinking)
- [Model Card](https://huggingface.co/moonshotai/Kimi-K2-Thinking)
- [Moonshot AI](https://www.moonshot.cn/)
|