--- license: apache-2.0 tags: - adreno - android - opencl - mobile - on-device-inference --- # adreno-llms-weights Pre-converted **fp16 weights** for the model ports in [adreno-llms](https://github.com/a8nova/adreno-llms) — small language models hand-tuned for **Adreno 6xx GPUs on non-flagship Android phones**. These binaries are NOT directly compatible with HuggingFace `transformers` or PyTorch. They use a custom layout produced by [NNOpt](mailto:a8nova@gmail.com) and are consumed by the C++/OpenCL inference binaries in the GitHub repo above. ## Usage ```bash git clone https://github.com/a8nova/adreno-llms.git cd adreno-llms ./scripts/fetch_weights.sh smollm2-135m-instruct # pulls from this repo cd src/models/smollm2-135m-instruct NNOPT_DTYPE=fp16 ./scripts/build.sh --release NNOPT_DTYPE=fp16 ./scripts/deploy_android.sh NNOPT_DTYPE=fp16 ./scripts/run_android.sh "Once upon a time" 64 ``` See [the GitHub repo README](https://github.com/a8nova/adreno-llms) for full setup, hardware requirements, and per-model performance numbers (5-run warm median on Motorola Razr 2020 / Adreno 618). ## Models in this repo Decode tok/s = 5-run warm median, fp16, greedy (`temperature=0, seed=42`), 32-token generation, on Motorola Razr 2020 (Adreno 618), measured 2026-05-06. | Path | Upstream | Params | Decode tok/s | License of upstream weights | |---|---|---:|---:|---| | `mamba2-130m/model.fp16.bin` | [state-spaces/mamba2-130m](https://huggingface.co/state-spaces/mamba2-130m) | 130M | 23.18 | Apache 2.0 | | `mamba-130m/model.fp16.bin` | [state-spaces/mamba-130m-hf](https://huggingface.co/state-spaces/mamba-130m-hf) | 130M | 22.15 | Apache 2.0 | | `smollm2-135m-instruct/model.fp16.bin` | [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) | 135M | 14.57 | Apache 2.0 | | `lfm2-5-350m/model.fp16.bin` | [LiquidAI/LFM2.5-350M-Base](https://huggingface.co/LiquidAI/LFM2.5-350M-Base) | 350M | 10.20 | Liquid AI Open License | | `qwen2-5-0-5b/model.fp16.bin` | [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) | 500M | 8.45 | Apache 2.0 | | `openelm-270m/` (companion files only) | [apple/OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) | 270M | 4.47 | Apple ASCL — fetch + convert locally | **OpenELM-270M is partially hosted here.** Under `openelm-270m/` you'll find only the small companion files: ``` openelm-270m/model.fp16.meta.json # tensor layout for the C++ runtime openelm-270m/tokenizer.json # HuggingFace tokenizer config openelm-270m/tokenizer_vocab.bin # vocab + merges (binary) ``` The actual `model.fp16.bin` is **NOT redistributed** — Apple's [Apple Sample Code License](https://huggingface.co/apple/OpenELM-270M/blob/main/LICENSE) restricts that. Instead, `scripts/fetch_openelm_weights.sh` in the GitHub repo pulls `apple/OpenELM-270M`'s safetensors directly from Apple's HF and runs `scripts/convert_openelm_weights.py` to produce the binary locally using the layout described in `model.fp16.meta.json`. ## License - These conversion artifacts: Apache 2.0 (re-publish freely, attribute the upstream model). - Underlying model weights: each carries its upstream license (see the table above). Users are responsible for compliance.