DeepSeek-R1-0528-Qwen3-8B - GGUF Quantized (Q4_K_M)

1. Introduction

This repository contains the DeepSeek-R1-0528 8B model quantized to GGUF format using llama.cpp. It is in Q4_K_M format, suitable for fast inference on CPU or GPU.

2. Model Info

Base model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
Format: GGUF
Tool: llama.cpp built with -DGGML_ENABLE_ARM=ON.
Conversion: bfloat16 safetensors => bfloat16
Quantization: bfloat16 => Q4_K_M
File: deepseek-r1-0528-qwen3-8b-q4_k_m.gguf

3. How to Run Locally

Tested on RockChip RK3588/RK3588S running ARM CPU cores.

llama.cpp

After cloning and building llama.cpp with -DGGML_ENABLE_ARM=ON use llama-run tool.

./llama.cpp/build/bin/llama-run ./deepseek-r1-0528-qwen3-8b-q4_k_m.gguf --prompt "What is the capital of France?"

Ollama

ollama run hf.co/guynich/DeepSeek-R1-0528-Qwen3-8B_Q4_K_M:Q4_K_M

I did see some looping during thinking discussed in this post.

4. License

Please refer to the original license for terms of use.

Downloads last month: 7

GGUF

Model size

8B params

Architecture

qwen3

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support