Instructions to use deepseek-ai/DeepSeek-R1-Distill-Qwen-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepseek-ai/DeepSeek-R1-Distill-Qwen-14B with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Why change the configuration of the tokenizer?
Why change the configuration of the tokenizer instead of continuing to use Qwen2.5's chat template?
From what I have observed, the Distill model tokenizer has replaced the token IDs that were already trained in the Qwen2.5-Instruct model. I believe these token IDs might have been assigned certain meanings by the model. However, the structure of the Distill chat template could potentially alter the meanings of these token IDs. Could this lead to a decline in performance or make it more difficult to inject new capabilities?
These tokens from Qwen are reserved for multimodal models. We replace them for the reasoning model.
These tokens from Qwen are reserved for multimodal models. We replace them for the reasoning model.
May I ask why you use '<ο½' and 'ο½>' instead of '<|' and '|>'? Not a very common pick.