kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5

#1051

by kxdw2580 - opened Jun 14, 2025

Discussion

kxdw2580

Jun 14, 2025

Maybe you can help me? Thanks!

https://huggingface.co/kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5

nicoboss

Jun 14, 2025

We already tried this on 6th of June. The BPE pre-tokenizer of this model was not recognized by llama.cpp. It wants one with hash b0f33aec525001c9de427a8f9958d1c8a3956f476bec64403680521281c032e2. I can only do this model if you tell use which of the llama.cpp compatible pre-tokenizers we should use instead. The nearest I can find is likely deepseek-r1-qwen with hash b3f499bb4255f8ca19fccd664443283318f2fd2414d5e0b040fbdd0cc195d6c5 but that was for deepseek with qwen 2.5 and so potentially incompatible.

kxdw2580

Jun 14, 2025

•

edited Jun 14, 2025

Maybe you can try the pre-tokenizer configuration of deepseek-r1, as the model is fine-tuned on deepseek-ai/DeepSeek-R1-0528-Qwen3-8B, whose readme explicitly states:
"The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528. This model can be run in the same manner as Qwen3-8B, but it is essential to ensure that all configuration files are sourced from our repository rather than the original Qwen3 project."

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment