Instructions to use litert-community/DeepSeek-R1-Distill-Qwen-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use litert-community/DeepSeek-R1-Distill-Qwen-7B with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=litert-community/DeepSeek-R1-Distill-Qwen-7B \ model.litertlm \ --prompt="Write me a poem"
- LiteRT
How to use litert-community/DeepSeek-R1-Distill-Qwen-7B with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
DeepSeek-R1-Distill-Qwen-7B — LiteRT-LM (blockwise int4)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
converted to the LiteRT-LM (.litertlm) format for on-device inference with
Google's LiteRT-LM runtime (the
engine behind the official litert-community/* models).
A reasoning model: it emits a <think> … </think> chain before the answer.
MIT-licensed (distilled onto an Apache-2.0 Qwen2.5 base). Converted with the
official upstream litert-torch — no fork, no custom code.
| File | DeepSeek-R1-Distill-Qwen-7B_q4_block32_ekv4096.litertlm (~4.2 GB) |
| Quantization | int4 weights — blockwise (block 32) + OCTAV optimal-clipping, symmetric; embedding INT8 |
| Compute | integer |
| Context (KV cache) | 4096 |
| Base model | deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
| Decode speed | ~67 tok/s (Mac M-series, LiteRT-LM, Metal GPU, greedy) |
| Platforms | Desktop (Mac) ✓ · high-RAM (12 GB+) Android ✓ · iPhone / 8 GB phones ✗ (4 GB exceeds the budget) |
Usage
litert_lm_main --model_path DeepSeek-R1-Distill-Qwen-7B_q4_block32_ekv4096.litertlm --backend gpu \
--input_prompt "If a train travels 60 km in 45 minutes, what is its speed in km/h?"
The .litertlm bundle carries the tokenizer and the DeepSeek prompt template
(<|User|> / <|Assistant|>, stop token <|end▁of▁sentence|>). The assistant
opens a <think> block, reasons step by step, then gives the final answer
(commonly in \boxed{}).
Quality — GSM8K parity
GSM8K (n=100, greedy, 0-shot, identical prompt + answer-extraction; max_new_tokens=2048
to fit the reasoning chain).
| Configuration | GSM8K |
|---|---|
| bf16 (reference) | 88.0% |
| This model — LiteRT int4 (BOCTAV4) | 87.0% |
LiteRT int4 is at parity — −1.0 pt vs bf16. The reasoning behavior is fully preserved through 4-bit quantization; the shallow-wide Qwen2 (28 layers) absorbs int4 rounding cleanly.
Conversion
Converted with the official upstream litert-torch
export_hf (clean git worktree at upstream/main, dev-fork patches excluded).
Qwen2ForCausalLM rides the stock converter with no custom code. int4 recipe =
blockwise (block 32) + OCTAV with INT8 embedding (externalized into its own
bundle section); KV cache 4096.
Training data & PII
This is a weights-exact format conversion of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B; no new training was performed. It is a Qwen2.5-7B-family base supervised-fine-tuned by DeepSeek on ~800K reasoning traces generated by DeepSeek-R1; the distillation set is model-generated and the Qwen base pretraining corpus is web-derived and not fully disclosed. Web-derived data may incidentally contain PII; none was deliberately collected and this format conversion adds none. Apply your own content/PII filtering before deployment. See the base model card for details.
License
MIT (model weights), inherited from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B; the Qwen2.5 base is Apache-2.0. Commercial use and derivatives permitted.
- Downloads last month
- 122
Model tree for litert-community/DeepSeek-R1-Distill-Qwen-7B
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B