dispatchAI SDK
Small. Mobile. Free. UAE-built.
pip install dispatchai β Run mobile-optimized LLMs on your phone, edge device, or laptop. 31 verified models, all tested on real Snapdragon hardware, all free.
Quick Start
pip install dispatchai[gguf]
Chat with a model
from dispatchai import load_model
model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
response = model.chat("What is the capital of France?")
print(response)
# β "The capital of France is Paris."
π Inference API
Use dispatchAI models via REST API (OpenAI-compatible):
import openai
client = openai.OpenAI(
base_url="https://api.dispatchai.ai/v1",
api_key="da-demo-key-0001"
)
response = client.chat.completions.create(
model="dispatchAI/SmolLM2-135M-Instruct-mobile",
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(response.choices[0].message.content)
# β "The capital of France is Paris."
Pricing: $0.001/1K input tokens, $0.002/1K output tokens (10x cheaper than OpenAI)
Endpoint: https://api.dispatchai.ai/v1
Available Models:
- dispatchAI/SmolLM2-135M-Instruct-mobile (101MB, 46 t/s on phone)
- dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 (469MB, 23 t/s on phone)
- dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile (770MB, 5.4 t/s on phone)
Local Inference
Find the best model for your phone
from dispatchai import recommend
rec = recommend(ram_mb=2048, task="chat")
print(f"Best model: {rec['recommended']['name']}")
List all models
from dispatchai import list_models
for m in list_models(task="chat"):
print(f" {m['name']}: {m['size_mb']}MB, {m['speed_tps']} t/s")
Estimate latency
from dispatchai import estimate_latency
lat = estimate_latency("1B", "Q4_K_M")
print(f"{lat['tokens_per_sec']} t/s on Snapdragon 865")
Calculate cost savings
from dispatchai import calculate_cost
result = calculate_cost(daily_queries=10000, cloud_cost_per_1k=0.50)
print(f"Annual savings: ${result['savings']}")
Installation Options
pip install dispatchai # Core (model catalog, recommendations)
pip install dispatchai[torch] # + transformers/torch backend
pip install dispatchai[gguf] # + llama.cpp GGUF backend
pip install dispatchai[full] # + everything
Verified Models (June 2026)
- β 31 models fully working (0 broken, 0 partial)
- π± 24 models phone-verified on Snapdragon 865
- All have correct chat formats documented
Top 3 Models
| Model | Size | Phone Speed | Use Case |
|---|---|---|---|
| SmolLM2-135M | 101MB | 46.0 t/s | Ultra-fast, budget phones |
| Qwen2.5-0.5B-int4 | 469MB | 23.2 t/s | Best balance for mobile |
| Llama-3.2-1B-Q4 | 770MB | 5.4 t/s | Best quality under 1GB |
About
Dispatch AI (FZE) β Sharjah Free Zone, UAE. License No. 10818.
π dispatchai.ai | π€ huggingface.co/dispatchAI | API: api.dispatchai.ai
I think, therefore I ship.