---
license: apache-2.0
tags:
  - adreno
  - android
  - opencl
  - mobile
  - on-device-inference
---

# adreno-llms-weights

Pre-converted **fp16 weights** for the model ports in [adreno-llms](https://github.com/a8nova/adreno-llms) — small language models hand-tuned for **Adreno 6xx GPUs on non-flagship Android phones**.

These binaries are NOT directly compatible with HuggingFace `transformers` or PyTorch. They use a custom layout produced by [NNOpt](mailto:a8nova@gmail.com) and are consumed by the C++/OpenCL inference binaries in the GitHub repo above.

## Usage

```bash
git clone https://github.com/a8nova/adreno-llms.git
cd adreno-llms
./scripts/fetch_weights.sh smollm2-135m-instruct   # pulls from this repo
cd src/models/smollm2-135m-instruct
NNOPT_DTYPE=fp16 ./scripts/build.sh --release
NNOPT_DTYPE=fp16 ./scripts/deploy_android.sh
NNOPT_DTYPE=fp16 ./scripts/run_android.sh "Once upon a time" 64
```

See [the GitHub repo README](https://github.com/a8nova/adreno-llms) for full setup, hardware requirements, and per-model performance numbers (5-run warm median on Motorola Razr 2020 / Adreno 618).

## Models in this repo

Decode tok/s = 5-run warm median, fp16, greedy (`temperature=0, seed=42`), 32-token generation, on Motorola Razr 2020 (Adreno 618), measured 2026-05-06.

| Path | Upstream | Params | Decode tok/s | License of upstream weights |
|---|---|---:|---:|---|
| `mamba2-130m/model.fp16.bin` | [state-spaces/mamba2-130m](https://huggingface.co/state-spaces/mamba2-130m) | 130M | 23.18 | Apache 2.0 |
| `mamba-130m/model.fp16.bin` | [state-spaces/mamba-130m-hf](https://huggingface.co/state-spaces/mamba-130m-hf) | 130M | 22.15 | Apache 2.0 |
| `smollm2-135m-instruct/model.fp16.bin` | [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) | 135M | 14.57 | Apache 2.0 |
| `lfm2-5-350m/model.fp16.bin` | [LiquidAI/LFM2.5-350M-Base](https://huggingface.co/LiquidAI/LFM2.5-350M-Base) | 350M | 10.20 | Liquid AI Open License |
| `qwen2-5-0-5b/model.fp16.bin` | [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) | 500M | 8.45 | Apache 2.0 |
| `openelm-270m/` (companion files only) | [apple/OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) | 270M | 4.47 | Apple ASCL — fetch + convert locally |

**OpenELM-270M is partially hosted here.** Under `openelm-270m/` you'll find only the small companion files:

```
openelm-270m/model.fp16.meta.json     # tensor layout for the C++ runtime
openelm-270m/tokenizer.json           # HuggingFace tokenizer config
openelm-270m/tokenizer_vocab.bin      # vocab + merges (binary)
```

The actual `model.fp16.bin` is **NOT redistributed** — Apple's [Apple Sample Code License](https://huggingface.co/apple/OpenELM-270M/blob/main/LICENSE) restricts that. Instead, `scripts/fetch_openelm_weights.sh` in the GitHub repo pulls `apple/OpenELM-270M`'s safetensors directly from Apple's HF and runs `scripts/convert_openelm_weights.py` to produce the binary locally using the layout described in `model.fp16.meta.json`.

## License

- These conversion artifacts: Apache 2.0 (re-publish freely, attribute the upstream model).
- Underlying model weights: each carries its upstream license (see the table above). Users are responsible for compliance.