DictaLM-3.0-1.7B β Android / Google AI Edge Bundles
On-device Hebrew LLM for Android using Google AI Edge.
Converts DictaLM-3.0-1.7B-Thinking and DictaLM-3.0-1.7B-Instruct (Hebrew LLMs fine-tuned from Qwen3-1.7B by Dicta) into .litertlm format that runs locally on Android β no internet or cloud API required.
Files
| File | Size | Variant | Quant | Template |
|---|---|---|---|---|
dictalm-3.0-1.7b-thinking-float16-thinking.litertlm |
3.3 GB | Thinking | float16 | thinking enabled |
dictalm-3.0-1.7b-thinking-dynamic_int8.litertlm |
1.7 GB | Thinking | dynamic INT8 | no-think (fast) |
dictalm-3.0-1.7b-thinking-int4.litertlm |
852 MB | Thinking | INT4 | no-think (fast) |
dictalm-3.0-1.7b-thinking-float16.litertlm |
3.3 GB | Thinking | float16 | no-think (fast) |
dictalm-3.0-1.7b-instruct-float16.litertlm |
3.3 GB | Instruct | float16 | β |
dictalm-3.0-1.7b-instruct-dynamic_int8.litertlm |
1.7 GB | Instruct | dynamic INT8 | β |
dictalm-3.0-1.7b-instruct-int4.litertlm |
852 MB | Instruct | INT4 | β |
Which to use?
- Best quality + visible reasoning:
thinking-float16-thinkingβ model shows its thought process before answering - Best quality, no reasoning overhead:
thinking-float16orinstruct-float16 - Balanced (recommended):
thinking-dynamic_int8β good Hebrew quality, fits comfortably in RAM - Fastest / smallest:
thinking-int4orinstruct-int4
Quick Start β Google AI Edge Gallery (Android)
- Download a
.litertlmfile from this repo - Install Google AI Edge Gallery on your Android device
- Open the app β Add Model β select your
.litertlmfile - Start chatting in Hebrew
Device Requirements
| Spec | Minimum |
|---|---|
| RAM | 3 GB free (INT4) / 5 GB free (INT8) / 8 GB free (float16) |
| Storage | 1 GB (INT4) / 2 GB (INT8) / 4 GB (float16) |
| OS | Android 10+ |
| Runtime | Google AI Edge Gallery |
Tested on Pixel 10 (12 GB RAM).
Thinking Mode
The *-thinking-float16-thinking.litertlm bundle embeds a Jinja chat template that starts generation with <|im_start|>assistant\n<think>\n, which prompts the model to reason before answering. The <think>...</think> block will appear as raw text in the app output.
The other Thinking-variant bundles embed <think>\n\n</think> in the generation prompt to suppress thinking for faster responses.
The Instruct variant has no thinking capability.
Hebrew Language
All bundles embed a Hebrew system prompt:
You are a helpful Hebrew AI assistant named DictaLM. Respond in Hebrew unless explicitly asked otherwise.
ΧΧͺΧ Χ’ΧΧΧ¨ AI Χ©ΧΧΧΧ©Χ ΧΧ’ΧΧ¨ΧΧͺ ΧΧ©Χ DictaLM. ΧΧΧ ΧΧ’ΧΧ¨ΧΧͺ ΧΧΧ ΧΧ ΧΧΧ§Χ©Χ ΧΧΧ ΧΧΧ¨Χͺ.
How These Were Built
Architecture
DictaLM-3.0-1.7B is a fine-tune of Qwen3-1.7B:
| Property | Value |
|---|---|
| Architecture | Qwen3ForCausalLM |
| Layers | 28 |
| Hidden size | 2048 |
| Heads (Q/KV) | 16 / 8 (GQA) |
| Head dim | 128 |
| Intermediate size | 6144 |
| Vocab size | 151,936 |
| Tokenizer | HuggingFace tiktoken |
| Tie embeddings | True |
litert-torch 0.8.0 already includes qwen3.build_1_7b_model() which exactly matches this config β no custom builder was needed.
Conversion Pipeline
HuggingFace safetensors
β
litert-torch (Strategy 1: native Qwen3 builder)
β
TFLite with prefill/decode KV-cache signatures
β
bundle_litertlm.py (LlmMetadata proto + HF tokenizer + Jinja template)
β
.litertlm (Google AI Edge runtime format)
Key detail: lm_head.weight is absent from the checkpoint (tie_word_embeddings=True) and is copied from model.embed_tokens.weight in a custom weight loader before conversion.
Reproduce a Build
Requirements: ~32 GB RAM, Python 3.12, litert-torch==0.8.0, LiteRT-LM builder
# Install dependencies
pip install litert-torch==0.8.0 mediapipe transformers safetensors
# Clone LiteRT-LM (needed by bundle_litertlm.py)
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
# Build flatbuffer + proto bindings β see LiteRT-LM docs
# Download model
huggingface-cli download dicta-il/DictaLM-3.0-1.7B-Thinking \
--local-dir ./dictalm-3.0-1.7b-thinking
# Convert to TFLite (Strategy 1 β KV-cache prefill/decode)
python scripts/convert_dictalm_android.py \
--model-dir ./dictalm-3.0-1.7b-thinking \
--tflite-dir ./tflite_output/thinking_fp16 \
--quantize float16 \
--prefill-seq-len 1024 --kv-cache-max-len 1024 \
--skip-task-bundle
# Bundle as .litertlm with thinking enabled
python scripts/bundle_litertlm.py \
--tflite ./tflite_output/thinking_fp16/*.tflite \
--tokenizer ./dictalm-3.0-1.7b-thinking/tokenizer.json \
--tokenizer-type hf \
--model-type qwen3 \
--thinking \
--output ./dictalm-3.0-1.7b-thinking-float16-thinking.litertlm \
--quant float16
# Or without thinking (no-think mode, faster):
python scripts/bundle_litertlm.py \
--tflite ./tflite_output/thinking_fp16/*.tflite \
--tokenizer ./dictalm-3.0-1.7b-thinking/tokenizer.json \
--tokenizer-type hf \
--model-type qwen3 \
--output ./dictalm-3.0-1.7b-thinking-float16.litertlm \
--quant float16
Quantization Options
--quantize |
Converter method | Approx size |
|---|---|---|
float16 |
fp16 | 3.3 GB |
dynamic_int8 |
dynamic_int8 (weights + activations) | 1.7 GB |
int8 |
weight_only_int8 | 1.7 GB |
int4 |
dynamic_int4_block128 | 852 MB |
Scripts
See scripts/ folder:
| Script | Purpose |
|---|---|
convert_dictalm_android.py |
Convert DictaLM safetensors β TFLite with KV cache |
bundle_litertlm.py |
Bundle TFLite + HF tokenizer + LlmMetadata β .litertlm |
bundle_and_upload_dictalm.sh |
Automation: monitor conversions, bundle, upload to HF |
License
Model weights: Qwen License / DictaLM terms Conversion scripts: Apache 2.0
- Downloads last month
- -
Model tree for barakplasma/dictalm-3.0-1.7b-thinking-android
Base model
dicta-il/DictaLM-3.0-1.7B-Instruct