kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5

#1051
by kxdw2580 - opened

We already tried this on 6th of June. The BPE pre-tokenizer of this model was not recognized by llama.cpp. It wants one with hash b0f33aec525001c9de427a8f9958d1c8a3956f476bec64403680521281c032e2. I can only do this model if you tell use which of the llama.cpp compatible pre-tokenizers we should use instead. The nearest I can find is likely deepseek-r1-qwen with hash b3f499bb4255f8ca19fccd664443283318f2fd2414d5e0b040fbdd0cc195d6c5 but that was for deepseek with qwen 2.5 and so potentially incompatible.

Maybe you can try the pre-tokenizer configuration of deepseek-r1, as the model is fine-tuned on deepseek-ai/DeepSeek-R1-0528-Qwen3-8B, whose readme explicitly states:
"The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528. This model can be run in the same manner as Qwen3-8B, but it is essential to ensure that all configuration files are sourced from our repository rather than the original Qwen3 project."

Sign up or log in to comment