DeepSeek-R1-0528-Qwen3-8B - GGUF Quantized (Q4_K_M)

1. Introduction

This repository contains the DeepSeek-R1-0528 8B model quantized to GGUF format using llama.cpp. It is in Q4_K_M format, suitable for fast inference on CPU or GPU.

2. Model Info

  • Base model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
  • Format: GGUF
  • Tool: llama.cpp built with -DGGML_ENABLE_ARM=ON.
  • Conversion: bfloat16 safetensors => bfloat16
  • Quantization: bfloat16 => Q4_K_M
  • File: deepseek-r1-0528-qwen3-8b-q4_k_m.gguf

3. How to Run Locally

Tested on RockChip RK3588/RK3588S running ARM CPU cores.

llama.cpp

After cloning and building llama.cpp with -DGGML_ENABLE_ARM=ON use llama-run tool.

./llama.cpp/build/bin/llama-run ./deepseek-r1-0528-qwen3-8b-q4_k_m.gguf --prompt "What is the capital of France?"

Ollama

ollama run hf.co/guynich/DeepSeek-R1-0528-Qwen3-8B_Q4_K_M:Q4_K_M

I did see some looping during thinking discussed in this post.

4. License

Please refer to the original license for terms of use.

Downloads last month
10
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support