DeepSeek-R1-0528-Qwen3-8B - GGUF Quantized (Q4_K_M)
1. Introduction
This repository contains the
DeepSeek-R1-0528 8B
model quantized to GGUF format using llama.cpp. It is in Q4_K_M format,
suitable for fast inference on CPU or GPU.
2. Model Info
- Base model:
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B - Format: GGUF
- Tool:
llama.cppbuilt with-DGGML_ENABLE_ARM=ON. - Conversion: bfloat16 safetensors => bfloat16
- Quantization: bfloat16 => Q4_K_M
- File:
deepseek-r1-0528-qwen3-8b-q4_k_m.gguf
3. How to Run Locally
Tested on RockChip RK3588/RK3588S running ARM CPU cores.
llama.cpp
After cloning and building llama.cpp with -DGGML_ENABLE_ARM=ON use
llama-run tool.
./llama.cpp/build/bin/llama-run ./deepseek-r1-0528-qwen3-8b-q4_k_m.gguf --prompt "What is the capital of France?"
Ollama
ollama run hf.co/guynich/DeepSeek-R1-0528-Qwen3-8B_Q4_K_M:Q4_K_M
I did see some looping during thinking discussed in this post.
4. License
Please refer to the original license for terms of use.
- Downloads last month
- 10
Hardware compatibility
Log In
to view the estimation
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support