DeepSeek-R1-0528-Qwen3-8B - GGUF Quantized (Q4_K_M)

1. Introduction

This repository contains the DeepSeek-R1-0528 8B model quantized to GGUF format using llama.cpp. It is in Q4_K_M format, suitable for fast inference on CPU or GPU.

2. Model Info

  • Base model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
  • Format: GGUF
  • Tool: llama.cpp built with -DGGML_ENABLE_ARM=ON.
  • Conversion: bfloat16 safetensors => bfloat16
  • Quantization: bfloat16 => Q4_K_M
  • File: deepseek-r1-0528-qwen3-8b-q4_k_m.gguf

3. How to Run Locally

Tested on RockChip RK3588/RK3588S running ARM CPU cores.

llama.cpp

After cloning and building llama.cpp with -DGGML_ENABLE_ARM=ON use llama-run tool.

./llama.cpp/build/bin/llama-run ./deepseek-r1-0528-qwen3-8b-q4_k_m.gguf --prompt "What is the capital of France?"

Ollama

ollama run hf.co/guynich/DeepSeek-R1-0528-Qwen3-8B_Q4_K_M:Q4_K_M

I did see some looping during thinking discussed in this post.

4. License

Please refer to the original license for terms of use.

Downloads last month
7
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support