adreno-llms-weights

Pre-converted fp16 weights for the model ports in adreno-llms โ€” small language models hand-tuned for Adreno 6xx GPUs on non-flagship Android phones.

These binaries are NOT directly compatible with HuggingFace transformers or PyTorch. They use a custom layout produced by NNOpt and are consumed by the C++/OpenCL inference binaries in the GitHub repo above.

Usage

git clone https://github.com/a8nova/adreno-llms.git
cd adreno-llms
./scripts/fetch_weights.sh smollm2-135m-instruct   # pulls from this repo
cd src/models/smollm2-135m-instruct
NNOPT_DTYPE=fp16 ./scripts/build.sh --release
NNOPT_DTYPE=fp16 ./scripts/deploy_android.sh
NNOPT_DTYPE=fp16 ./scripts/run_android.sh "Once upon a time" 64

See the GitHub repo README for full setup, hardware requirements, and per-model performance numbers (5-run warm median on Motorola Razr 2020 / Adreno 618).

Models in this repo

Decode tok/s = 5-run warm median, fp16, greedy (temperature=0, seed=42), 32-token generation, on Motorola Razr 2020 (Adreno 618), measured 2026-05-06.

Path Upstream Params Decode tok/s License of upstream weights
mamba2-130m/model.fp16.bin state-spaces/mamba2-130m 130M 23.18 Apache 2.0
mamba-130m/model.fp16.bin state-spaces/mamba-130m-hf 130M 22.15 Apache 2.0
smollm2-135m-instruct/model.fp16.bin HuggingFaceTB/SmolLM2-135M-Instruct 135M 14.57 Apache 2.0
lfm2-5-350m/model.fp16.bin LiquidAI/LFM2.5-350M-Base 350M 10.20 Liquid AI Open License
qwen2-5-0-5b/model.fp16.bin Qwen/Qwen2.5-0.5B 500M 8.45 Apache 2.0
openelm-270m/ (companion files only) apple/OpenELM-270M 270M 4.47 Apple ASCL โ€” fetch + convert locally

OpenELM-270M is partially hosted here. Under openelm-270m/ you'll find only the small companion files:

openelm-270m/model.fp16.meta.json     # tensor layout for the C++ runtime
openelm-270m/tokenizer.json           # HuggingFace tokenizer config
openelm-270m/tokenizer_vocab.bin      # vocab + merges (binary)

The actual model.fp16.bin is NOT redistributed โ€” Apple's Apple Sample Code License restricts that. Instead, scripts/fetch_openelm_weights.sh in the GitHub repo pulls apple/OpenELM-270M's safetensors directly from Apple's HF and runs scripts/convert_openelm_weights.py to produce the binary locally using the layout described in model.fp16.meta.json.

How were these produced?

Every binary in this repo was generated by NNOpt โ€” a coding agent for porting and optimizing neural networks for Android embedded targets. None of the kernels, layouts, or build tooling in the consumer repo was hand-written.

If you have a model you want running on Adreno, Snapdragon, Mali, or any Android device with this kind of polish, email a8nova@gmail.com for early access.

License

  • These conversion artifacts: Apache 2.0 (re-publish freely, attribute the upstream model).
  • Underlying model weights: each carries its upstream license (see the table above). Users are responsible for compliance.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support