DeepSeek-R1-Distill-Llama-8B-MNN

Pre-converted DeepSeek R1 Distill Llama 8B in MNN format for on-device inference with TokForge.

Original model by DeepSeek β€” converted to MNN Q4 for mobile deployment.

Model Details

Architecture Llama 3.1 (standard attention, 32 layers, GQA β€” distilled from DeepSeek-R1)
Parameters 8B (4-bit quantized)
Format MNN (Alibaba Mobile Neural Network)
Quantization W4A16 (4-bit weights, block size 128)
Vocab 128,256 tokens
Source deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Description

DeepSeek's R1 reasoning capability distilled into a Llama 3.1 8B body. Brings chain-of-thought reasoning to mobile devices. Shows its thinking process step-by-step, making it excellent for math, logic puzzles, coding, and complex analysis. Performance comparable to OpenAI o1 on reasoning tasks.

Files

File Description
llm.mnn Model computation graph
llm.mnn.weight Quantized weight data (Q4, block=128)
llm_config.json Model config with Jinja chat template
tokenizer.txt Tokenizer vocabulary
config.json MNN runtime config

Usage with TokForge

This model is optimized for TokForge β€” a free Android app for private, on-device LLM inference.

  1. Download TokForge from the Play Store
  2. Open the app β†’ Models β†’ Download this model
  3. Start chatting β€” runs 100% locally, no internet required

Recommended Settings

Setting Value
Backend OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback)
Precision Low
Threads 4
Thinking Off (or On for thinking-capable models)

Performance

Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:

Device SoC Backend tok/s
RedMagic 11 Pro SM8850 OpenCL ~14 tok/s
Lenovo TB520FU SM8650 OpenCL ~10 tok/s

Attribution

This is an MNN conversion of DeepSeek R1 Distill Llama 8B by DeepSeek. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.

Limitations

  • Intended for TokForge / MNN on-device inference on Android
  • This is a runtime bundle, not a standard Transformers training checkpoint
  • Quantization (Q4) may slightly reduce quality compared to the full-precision original
  • Abliterated/uncensored models have had safety filters removed β€” use responsibly

Community

Export Details

Converted using MNN's llmexport pipeline:

python llmexport.py --path deepseek-ai/DeepSeek-R1-Distill-Llama-8B --export mnn --quant_bit 4 --quant_block 128
Downloads last month
164
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for darkmaniac7/DeepSeek-R1-Distill-Llama-8B-MNN

Finetuned
(153)
this model

Collection including darkmaniac7/DeepSeek-R1-Distill-Llama-8B-MNN