Instructions to use mlboydaisuke/DeepSeek-R1-Distill-Qwen-1.5B-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use mlboydaisuke/DeepSeek-R1-Distill-Qwen-1.5B-LiteRT with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=mlboydaisuke/DeepSeek-R1-Distill-Qwen-1.5B-LiteRT \ model.litertlm \ --prompt="Write me a poem"
- LiteRT
How to use mlboydaisuke/DeepSeek-R1-Distill-Qwen-1.5B-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
DeepSeek-R1-Distill-Qwen-1.5B — LiteRT-LM (blockwise int4)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
converted to the LiteRT-LM (.litertlm) format for on-device inference with
Google's LiteRT-LM runtime.
A mobile-size reasoning model: it emits a <think> … </think> chain before
the answer, and at ~1 GB it runs on a phone. MIT-licensed (Apache-2.0 Qwen2.5
base). Converted with the official upstream litert-torch — no fork.
| File | model.litertlm (~1.0 GB) |
| Quantization | int4 weights — blockwise (block 32) + OCTAV optimal-clipping, symmetric; embedding INT8 |
| Compute | integer |
| Context (KV cache) | 4096 |
| Base model | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| Decode speed | ~116 tok/s (Mac M-series, Metal GPU, greedy); runs on 8 GB phones (iPhone / Android) |
Run on Android
The easiest way to try this on a phone is the official Google AI Edge Gallery app:
- Install a recent Gallery (
com.google.ai.edge.gallery, 1.0.15+ supports.litertlm). adb push model.litertlm /sdcard/Download/- In the app: + → pick the file → CPU or GPU. At ~1 GB this fits comfortably.
- Chat — the bundle carries the tokenizer and DeepSeek prompt template
(
<|User|>/<|Assistant|>, stop<|end▁of▁sentence|>). The model opens a<think>block, reasons, then answers.
To embed it in your own app, use the LiteRT-LM Kotlin API
(com.google.ai.edge.litertlm:litertlm-android).
Quality — GSM8K
GSM8K (n=100, greedy, 0-shot, identical prompt + extraction; max_new_tokens=2048).
| Configuration | GSM8K |
|---|---|
| bf16 (reference) | 81.0% |
| This model — LiteRT int4 (BOCTAV4) | 73.0% |
73 % is a strong, coherent, non-degenerate score for a 1.5B reasoning model that
fits on a phone; the <think> reasoning is preserved through 4-bit. At 1.5B,
int4 costs ~8 pt vs bf16 (small-model 4-bit sensitivity — a 1.5B has less
redundancy than the 7B sibling, which is at −1 pt parity). Shipped as int4 for the
best on-device size/speed.
Conversion
Official upstream litert-torch
export_hf (clean worktree at upstream/main, no fork). Qwen2ForCausalLM, no
custom code. int4 = blockwise-32 + OCTAV, INT8 embedding, KV cache 4096.
License
MIT (model weights); Qwen2.5 base is Apache-2.0. Commercial use and derivatives permitted.
- Downloads last month
- 83
Model tree for mlboydaisuke/DeepSeek-R1-Distill-Qwen-1.5B-LiteRT
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B