YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
dispatchAI SDK
Small. Mobile. Free. UAE-built.
pip install dispatchai β Run mobile-optimized LLMs on your phone, edge device, or laptop. 39 models, all tested on real Snapdragon hardware, all free.
Quick Start
pip install dispatchai
Chat with a model
from dispatchai import load_model
model = load_model("SmolLM2-135M-Instruct-mobile")
response = model.chat("What is the capital of France?")
print(response)
Use GGUF/llama.cpp backend
model = load_model("Llama-3.2-1B-Instruct-Q4-mobile", backend="gguf")
print(model.chat("Write a haiku about the desert."))
Find the best model for your phone
from dispatchai import recommend
rec = recommend(ram_mb=2048, task="chat")
print(f"Best model: {rec['recommended']['name']}")
print(f"Size: {rec['recommended']['size_mb']}MB")
print(f"Speed: {rec['recommended']['speed_tps']} tokens/sec")
List all models
from dispatchai import list_models
for m in list_models(task="chat"):
print(f" {m['name']}: {m['size_mb']}MB, {m['speed_tps']} t/s")
Estimate latency
from dispatchai import estimate_latency
lat = estimate_latency("1B", "Q4_K_M")
print(f"{lat['tokens_per_sec']} tokens/sec on Snapdragon 865")
Calculate cost savings
from dispatchai import calculate_cost
result = calculate_cost(daily_queries=10000, cloud_cost_per_1k=0.50)
print(f"Annual savings: ${result['savings']}")
Installation Options
pip install dispatchai # Core (model catalog, recommendations)
pip install dispatchai[torch] # + transformers/torch backend
pip install dispatchai[gguf] # + llama.cpp GGUF backend
pip install dispatchai[full] # + everything (torch, gguf, sentence-transformers)
Available Models
| Model | Params | Size | Speed | Task |
|---|---|---|---|---|
| SmolLM2-135M-Instruct-mobile | 135M | 270MB | 25.5 t/s | Chat |
| SmolLM2-360M-Instruct-mobile | 360M | 720MB | 21.0 t/s | Chat |
| Qwen2.5-0.5B-Instruct-mobile-int4 | 500M | 350MB | 20.0 t/s | Chat |
| Llama-3.2-1B-Instruct-Q4-mobile | 1B | 700MB | 18.2 t/s | Chat |
| Llama-3.2-1B-FunctionCall-mobile | 1B | 2.5GB | 12.0 t/s | Function Call |
| Qwen2.5-Coder-1.5B-mobile | 1.5B | 3.0GB | 10.5 t/s | Code |
| Gemma-2B-Arabic-mobile | 2B | 5.0GB | 8.0 t/s | Arabic |
| Llama-3.2-3B-Instruct-Q5-mobile | 3B | 2.1GB | 8.5 t/s | Chat |
Hardware Targets
All benchmarks measured on Snapdragon 865 (Samsung S20 FE, 8GB RAM) using llama.cpp.
The estimate_latency() function supports:
- Snapdragon 865 (baseline)
- Snapdragon 8 Gen 2 (1.8x)
- Snapdragon 8 Gen 3 (2.2x)
- Apple A17 Pro (2.5x)
- Apple M2 (3.0x)
- Snapdragon 778G mid-range (0.7x)
The Thesis
The best model is the one that runs.
We're building the AI layer for a billion phones that can't afford cloud inference. Every model is free, open-source, and tested on real hardware.
About
Dispatch AI (FZE) β Sharjah Free Zone, UAE. License No. 10818.
π dispatchai.ai | π€ huggingface.co/dispatchAI | π @DispatchAIdev
I think, therefore I ship.