Why change the configuration of the tokenizer?

by Lingrui - opened Jan 22, 2025

Jan 22, 2025

•

edited Jan 22, 2025

Why change the configuration of the tokenizer instead of continuing to use Qwen2.5's chat template?

From what I have observed, the Distill model tokenizer has replaced the token IDs that were already trained in the Qwen2.5-Instruct model. I believe these token IDs might have been assigned certain meanings by the model. However, the structure of the Distill chat template could potentially alter the meanings of these token IDs. Could this lead to a decline in performance or make it more difficult to inject new capabilities?

GeeeekExplorer

DeepSeek org Jan 22, 2025

These tokens from Qwen are reserved for multimodal models. We replace them for the reasoning model.

CHNtentes

Jan 23, 2025

These tokens from Qwen are reserved for multimodal models. We replace them for the reasoning model.

May I ask why you use '<｜' and '｜>' instead of '<|' and '|>'? Not a very common pick.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment