DeepSeek-R1-Distill-Qwen-14B

DeepSeek-R1-Distill-Qwen-14B is a reasoning-focused large language model distilled from the DeepSeek-R1 system into a Qwen2.5-14B backbone. It is optimized for structured reasoning, step-by-step problem solving, and instruction-following across complex analytical tasks.

The model is designed to deliver strong logical consistency and improved reasoning efficiency while maintaining the conversational and multilingual strengths of the Qwen architecture. It is suitable for research, experimentation, and production environments requiring reliable reasoning and long-form generation.


Model Overview

  • Model Name: DeepSeek-R1-Distill-Qwen-14B
  • Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
  • Architecture: Decoder-only Transformer
  • Parameter Count: 14 Billion
  • Context Window: Implementation dependent
  • Modalities: Text
  • Primary Languages: English, Chinese
  • Developer: DeepSeek AI
  • License: mit

Quantization Details

Q4_K_M

  • Approx. ~71% size reduction (8.37 GB)
  • Significant size reduction for efficient deployment
  • Lower memory requirements for CPU and limited-VRAM GPUs
  • Faster inference and token generation
  • Slight reduction in reasoning precision for complex multi-step problems

Q5_K_M

  • Approx. ~66% size reduction (9.79 GB)
  • Higher fidelity to the original model
  • Improved reasoning stability and coherence
  • Larger memory footprint than Q4 variants
  • Recommended when performance is prioritized over minimal resource usage

Training Overview

Pretraining

The underlying base model is trained on a large multilingual corpus including web data, code, structured documents, and academic material. Training emphasizes language understanding, long-range context modeling, and knowledge representation.

Reasoning Distillation

This model is further refined through knowledge distillation from a stronger reasoning model (DeepSeek-R1). Distillation focuses on transferring:

  • Step-by-step problem solving strategies
  • Logical decomposition of complex tasks
  • Structured reasoning traces
  • Improved mathematical and analytical performance

This model is built to enhance reasoning performance through distillation from a stronger reasoning system. Key design priorities include:

  • High-quality step-by-step reasoning
  • Strong logical consistency across multi-stage problems
  • Reliable instruction following
  • Efficient reasoning with reduced model size
  • Stable multi-turn conversational behavior
  • Structured and interpretable outputs

Core Capabilities

  • Advanced reasoning Performs multi-step logical analysis and structured problem solving.

  • Instruction adherence Executes complex prompts and detailed task specifications.

  • Extended context processing Maintains coherence across long inputs and multi-turn interactions.

  • Multilingual interaction Supports multiple languages with strong English and Chinese performance.

  • Structured output generation Produces organized responses such as stepwise solutions, lists, and formatted data.

  • Conversational consistency Maintains logical continuity across dialogue sessions.


Example Usage

llama.cpp

./llama-cli \
  -m DeepSeek-R1-Distill-Qwen-14B_Q4_K_M.gguf \
  -p "Explain how gradient descent works step by step."

Recommended Use Cases

  • Mathematical reasoning and problem solving
  • Scientific and technical explanation
  • Research assistance and analysis
  • Programming and algorithm design
  • Educational tutoring and step-by-step instruction
  • Long-form structured content generation

Acknowledgments

These quantized models are based on the original work by deepseek-ai development team.

Special thanks to:


Contact

For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.

Downloads last month
504
GGUF
Model size
15B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF

Quantized
(135)
this model