Instructions to use mlboydaisuke/OLMo-2-1B-Instruct-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use mlboydaisuke/OLMo-2-1B-Instruct-LiteRT with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=mlboydaisuke/OLMo-2-1B-Instruct-LiteRT \ model.litertlm \ --prompt="Write me a poem"
- LiteRT
How to use mlboydaisuke/OLMo-2-1B-Instruct-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
OLMo-2-1B-Instruct β LiteRT-LM (blockwise int4)
allenai/OLMo-2-0425-1B-Instruct
converted to the LiteRT-LM (.litertlm) format for on-device inference with
Google's LiteRT-LM runtime (the
engine behind the official litert-community/* models).
OLMo-2 is AllenAI's fully-open model family (Apache-2.0; open weights, data,
and training code). This 1B variant is small enough to run on a phone β verified on
iPhone 17 Pro. Converted with the official upstream litert-torch β no fork.
| File | model.litertlm (~0.93 GB) |
| Quantization | int4 weights β blockwise (block 32) + OCTAV optimal-clipping, symmetric; embedding INT8 |
| Compute | integer |
| Context (KV cache) | 4096 |
| Base model | allenai/OLMo-2-0425-1B-Instruct |
| Decode speed | ~24 tok/s (iPhone 17 Pro; loads 5.2 s, ~1.2 GB footprint) Β· ~138 tok/s (Mac M-series, Metal GPU) |
Usage
Run with the LiteRT-LM runtime:
litert_lm_main \
--model_path model.litertlm \
--backend gpu \
--input_prompt "Explain on-device AI in one sentence."
The .litertlm bundle carries the tokenizer and the prompt template (OLMo-2's
native TΓΌlu format β <|user|> / <|assistant|>, stop token <|endoftext|>),
so no separate tokenizer files are needed.
Run on Android
The easiest way to try this model on a phone is the official Google AI Edge Gallery app:
- Install a recent Gallery (package
com.google.ai.edge.gallery, APK from the repo's releases β 1.0.15+ supports.litertlm). - Download
model.litertlmand push it to the device:adb push model.litertlm /sdcard/Download/ - In the app, tap + (bottom-right), pick the file, and choose CPU or GPU. At ~0.93 GB this 1B fits comfortably on an 8 GB phone.
- Chat β the bundle already carries the tokenizer and OLMo-2 prompt template.
See the Gallery
Importing Local Models
guide for details. To embed it in your own Android app, use the LiteRT-LM Kotlin API
(com.google.ai.edge.litertlm:litertlm-android).
Quality β GSM8K
Measured on GSM8K (n=100, greedy, 0-shot chain-of-thought, identical prompt and answer-extraction for every row).
| Configuration | GSM8K |
|---|---|
| bf16 (reference) | 72.0% |
| This model β LiteRT int4 (BOCTAV4) | 63.0% |
63 % is a strong, coherent, non-degenerate score for a 1B (the \boxed{}-style answers
terminate cleanly at <|endoftext|>). At 1B, 4-bit quantization costs ~9 pt vs bf16 β
a small model has less redundancy to absorb int4 rounding than a 3B+ (where the same
recipe is at parity). An int8 build recovers only ~2 pt (65 %) for +60 % size, so int4
is shipped as the best size/quality trade-off for on-device.
Conversion
Converted with the official upstream litert-torch
export_hf (clean git worktree at upstream/main, dev-fork patches excluded).
Olmo2ForCausalLM rides the stock converter with no custom code: QK-norm and OLMo-2's
reordered post-norm lower to generic ops. The int4 recipe is blockwise (block 32) +
OCTAV with the embedding at INT8.
License
Apache-2.0, inherited from the base model allenai/OLMo-2-0425-1B-Instruct.
- Downloads last month
- 4
Model tree for mlboydaisuke/OLMo-2-1B-Instruct-LiteRT
Base model
allenai/OLMo-2-0425-1B