dispatchAI SDK

Small. Mobile. Free. UAE-built.

pip install dispatchai — Run mobile-optimized LLMs on your phone, edge device, or laptop. 39 models, all tested on real Snapdragon hardware, all free.

Quick Start

pip install dispatchai

Chat with a model

from dispatchai import load_model

model = load_model("SmolLM2-135M-Instruct-mobile")
response = model.chat("What is the capital of France?")
print(response)

Use GGUF/llama.cpp backend

model = load_model("Llama-3.2-1B-Instruct-Q4-mobile", backend="gguf")
print(model.chat("Write a haiku about the desert."))

Find the best model for your phone

from dispatchai import recommend

rec = recommend(ram_mb=2048, task="chat")
print(f"Best model: {rec['recommended']['name']}")
print(f"Size: {rec['recommended']['size_mb']}MB")
print(f"Speed: {rec['recommended']['speed_tps']} tokens/sec")

List all models

from dispatchai import list_models

for m in list_models(task="chat"):
    print(f"  {m['name']}: {m['size_mb']}MB, {m['speed_tps']} t/s")

Estimate latency

from dispatchai import estimate_latency

lat = estimate_latency("1B", "Q4_K_M")
print(f"{lat['tokens_per_sec']} tokens/sec on Snapdragon 865")

Calculate cost savings

from dispatchai import calculate_cost

result = calculate_cost(daily_queries=10000, cloud_cost_per_1k=0.50)
print(f"Annual savings: ${result['savings']}")

Installation Options

pip install dispatchai                    # Core (model catalog, recommendations)
pip install dispatchai[torch]             # + transformers/torch backend
pip install dispatchai[gguf]              # + llama.cpp GGUF backend
pip install dispatchai[full]              # + everything (torch, gguf, sentence-transformers)

Available Models

Model	Params	Size	Speed	Task
SmolLM2-135M-Instruct-mobile	135M	270MB	25.5 t/s	Chat
SmolLM2-360M-Instruct-mobile	360M	720MB	21.0 t/s	Chat
Qwen2.5-0.5B-Instruct-mobile-int4	500M	350MB	20.0 t/s	Chat
Llama-3.2-1B-Instruct-Q4-mobile	1B	700MB	18.2 t/s	Chat
Llama-3.2-1B-FunctionCall-mobile	1B	2.5GB	12.0 t/s	Function Call
Qwen2.5-Coder-1.5B-mobile	1.5B	3.0GB	10.5 t/s	Code
Gemma-2B-Arabic-mobile	2B	5.0GB	8.0 t/s	Arabic
Llama-3.2-3B-Instruct-Q5-mobile	3B	2.1GB	8.5 t/s	Chat

Browse all 39 models →

Hardware Targets

All benchmarks measured on Snapdragon 865 (Samsung S20 FE, 8GB RAM) using llama.cpp.

The estimate_latency() function supports:

Snapdragon 865 (baseline)
Snapdragon 8 Gen 2 (1.8x)
Snapdragon 8 Gen 3 (2.2x)
Apple A17 Pro (2.5x)
Apple M2 (3.0x)
Snapdragon 778G mid-range (0.7x)

The Thesis

The best model is the one that runs.

We're building the AI layer for a billion phones that can't afford cloud inference. Every model is free, open-source, and tested on real hardware.

About

Dispatch AI (FZE) — Sharjah Free Zone, UAE. License No. 10818.

🌐 dispatchai.ai | 🤗 huggingface.co/dispatchAI | 𝕏 @DispatchAIdev

I think, therefore I ship.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

dispatchAI
/

dispatchAI-SDK