adreno-llms-weights
Pre-converted fp16 weights for the model ports in adreno-llms โ small language models hand-tuned for Adreno 6xx GPUs on non-flagship Android phones.
These binaries are NOT directly compatible with HuggingFace transformers or PyTorch. They use a custom layout produced by NNOpt and are consumed by the C++/OpenCL inference binaries in the GitHub repo above.
Usage
git clone https://github.com/a8nova/adreno-llms.git
cd adreno-llms
./scripts/fetch_weights.sh smollm2-135m-instruct # pulls from this repo
cd src/models/smollm2-135m-instruct
NNOPT_DTYPE=fp16 ./scripts/build.sh --release
NNOPT_DTYPE=fp16 ./scripts/deploy_android.sh
NNOPT_DTYPE=fp16 ./scripts/run_android.sh "Once upon a time" 64
See the GitHub repo README for full setup, hardware requirements, and per-model performance numbers (5-run warm median on Motorola Razr 2020 / Adreno 618).
Models in this repo
Decode tok/s = 5-run warm median, fp16, greedy (temperature=0, seed=42), 32-token generation, on Motorola Razr 2020 (Adreno 618), measured 2026-05-06.
| Path | Upstream | Params | Decode tok/s | License of upstream weights |
|---|---|---|---|---|
mamba2-130m/model.fp16.bin |
state-spaces/mamba2-130m | 130M | 23.18 | Apache 2.0 |
mamba-130m/model.fp16.bin |
state-spaces/mamba-130m-hf | 130M | 22.15 | Apache 2.0 |
smollm2-135m-instruct/model.fp16.bin |
HuggingFaceTB/SmolLM2-135M-Instruct | 135M | 14.57 | Apache 2.0 |
lfm2-5-350m/model.fp16.bin |
LiquidAI/LFM2.5-350M-Base | 350M | 10.20 | Liquid AI Open License |
qwen2-5-0-5b/model.fp16.bin |
Qwen/Qwen2.5-0.5B | 500M | 8.45 | Apache 2.0 |
openelm-270m/ (companion files only) |
apple/OpenELM-270M | 270M | 4.47 | Apple ASCL โ fetch + convert locally |
OpenELM-270M is partially hosted here. Under openelm-270m/ you'll find only the small companion files:
openelm-270m/model.fp16.meta.json # tensor layout for the C++ runtime
openelm-270m/tokenizer.json # HuggingFace tokenizer config
openelm-270m/tokenizer_vocab.bin # vocab + merges (binary)
The actual model.fp16.bin is NOT redistributed โ Apple's Apple Sample Code License restricts that. Instead, scripts/fetch_openelm_weights.sh in the GitHub repo pulls apple/OpenELM-270M's safetensors directly from Apple's HF and runs scripts/convert_openelm_weights.py to produce the binary locally using the layout described in model.fp16.meta.json.
How were these produced?
Every binary in this repo was generated by NNOpt โ a coding agent for porting and optimizing neural networks for Android embedded targets. None of the kernels, layouts, or build tooling in the consumer repo was hand-written.
If you have a model you want running on Adreno, Snapdragon, Mali, or any Android device with this kind of polish, email a8nova@gmail.com for early access.
License
- These conversion artifacts: Apache 2.0 (re-publish freely, attribute the upstream model).
- Underlying model weights: each carries its upstream license (see the table above). Users are responsible for compliance.