Instructions to use mlboydaisuke/Falcon3-3B-Instruct-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use mlboydaisuke/Falcon3-3B-Instruct-LiteRT with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=mlboydaisuke/Falcon3-3B-Instruct-LiteRT \ model.litertlm \ --prompt="Write me a poem"
- LiteRT
How to use mlboydaisuke/Falcon3-3B-Instruct-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Falcon3-3B-Instruct β LiteRT-LM (blockwise int4)
tiiuae/Falcon3-3B-Instruct
converted to the LiteRT-LM (.litertlm) format for on-device inference with
Google's LiteRT-LM runtime (the
engine behind the official litert-community/* models).
Text-only conversion (the Falcon3 decoder; no vision/audio towers).
| File | model.litertlm (~1.74 GB) |
| Quantization | int4 weights β blockwise (block 128), symmetric; embeddings INT8 |
| Compute | integer |
| Context (KV cache) | 2048 |
| Base model | tiiuae/Falcon3-3B-Instruct |
| Decode speed | ~27 tok/s (iPhone 17 Pro, Metal GPU) Β· ~89 tok/s (Mac M4 Max, LiteRT-LM, greedy) |
Usage
Run with the LiteRT-LM runtime:
# build litert-lm from https://github.com/google-ai-edge/litert-lm, then:
litert_lm_main \
--model_path model.litertlm \
--backend gpu \
--input_prompt "Explain on-device AI in one sentence."
The .litertlm bundle carries the tokenizer and the prompt template (Falcon3's
native <|user|> / <|assistant|> format, stop token <|endoftext|>), so no
separate tokenizer files are needed.
Quality β GSM8K parity
Measured on GSM8K (n=100, greedy, 0-shot chain-of-thought asking for #### <n>,
identical prompt and answer-extraction for every row). The 4-bit MLX build is the
known-good 4-bit control:
| Configuration | GSM8K |
|---|---|
| bf16 (reference) | 75% |
| MLX 4-bit (control) | 76% |
| This model β LiteRT int4 | 77% |
LiteRT int4 is fully at parity β it matches or slightly exceeds both the 4-bit
control and bf16 here (the small spread is sampling noise at n=100). This is a
direct-answering instruct model (no <think> block) and terminates cleanly at
<|endoftext|>.
Conversion
Converted with litert-torch using a
blockwise int4 recipe (INT4 weights, block size 128, symmetric) with embeddings
kept at INT8, KV cache 2048, and Falcon3's native chat template. Falcon3-3B is a
standard LlamaForCausalLM architecture, so it rides the existing converter and
runtime with no custom code. Blockwise (not channelwise) int4 is what preserves
reasoning accuracy.
Reproduce (official tools only)
Built with stock litert-torch β no custom code, no graph patches. The only
non-default choice is the int4 recipe: the tool's default named int4 is
channelwise (which degrades small models), so this uses blockwise-128 (the
scheme the official models ship), passed as a recipe file to the standard export:
from litert_torch.generative.export_hf.export import export
export(
model="tiiuae/Falcon3-3B-Instruct",
output_dir="out",
quantization_recipe="falcon_int4_block128.json", # included in this repo
cache_length=2048,
trust_remote_code=True,
)
falcon_int4_block128.json is included in this repo. (If the export errors with a
missing ai_edge_quantizer/recipes/ directory, create it empty β a packaging gap
in some releases that trips the .json-recipe path.)
License
Falcon LLM License (TII), inherited from the base model tiiuae/Falcon3-3B-Instruct. See https://falconllm.tii.ae/falcon-terms-and-conditions.html
- Downloads last month
- 14
Model tree for mlboydaisuke/Falcon3-3B-Instruct-LiteRT
Unable to build the model tree, the base model loops to the model itself. Learn more.