| # dispatchAI SDK |
|
|
| **Small. Mobile. Free. UAE-built.** |
|
|
| `pip install dispatchai` β Run mobile-optimized LLMs on your phone, edge device, or laptop. 31 verified models, all tested on real Snapdragon hardware, all free. |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install dispatchai[gguf] |
| ``` |
|
|
| ### Chat with a model |
|
|
| ```python |
| from dispatchai import load_model |
| |
| model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf") |
| response = model.chat("What is the capital of France?") |
| print(response) |
| # β "The capital of France is Paris." |
| ``` |
|
|
| ## π Inference API |
|
|
| Use dispatchAI models via REST API (OpenAI-compatible): |
|
|
| ```python |
| import openai |
| |
| client = openai.OpenAI( |
| base_url="https://api.dispatchai.ai/v1", |
| api_key="da-demo-key-0001" |
| ) |
| |
| response = client.chat.completions.create( |
| model="dispatchAI/SmolLM2-135M-Instruct-mobile", |
| messages=[{"role": "user", "content": "What is the capital of France?"}] |
| ) |
| print(response.choices[0].message.content) |
| # β "The capital of France is Paris." |
| ``` |
|
|
| **Pricing:** $0.001/1K input tokens, $0.002/1K output tokens (10x cheaper than OpenAI) |
|
|
| **Endpoint:** `https://api.dispatchai.ai/v1` |
|
|
| **Available Models:** |
| - dispatchAI/SmolLM2-135M-Instruct-mobile (101MB, 46 t/s on phone) |
| - dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 (469MB, 23 t/s on phone) |
| - dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile (770MB, 5.4 t/s on phone) |
|
|
| ## Local Inference |
|
|
| ### Find the best model for your phone |
|
|
| ```python |
| from dispatchai import recommend |
| |
| rec = recommend(ram_mb=2048, task="chat") |
| print(f"Best model: {rec['recommended']['name']}") |
| ``` |
|
|
| ### List all models |
|
|
| ```python |
| from dispatchai import list_models |
| |
| for m in list_models(task="chat"): |
| print(f" {m['name']}: {m['size_mb']}MB, {m['speed_tps']} t/s") |
| ``` |
|
|
| ### Estimate latency |
|
|
| ```python |
| from dispatchai import estimate_latency |
| |
| lat = estimate_latency("1B", "Q4_K_M") |
| print(f"{lat['tokens_per_sec']} t/s on Snapdragon 865") |
| ``` |
|
|
| ### Calculate cost savings |
|
|
| ```python |
| from dispatchai import calculate_cost |
| |
| result = calculate_cost(daily_queries=10000, cloud_cost_per_1k=0.50) |
| print(f"Annual savings: ${result['savings']}") |
| ``` |
|
|
| ## Installation Options |
|
|
| ```bash |
| pip install dispatchai # Core (model catalog, recommendations) |
| pip install dispatchai[torch] # + transformers/torch backend |
| pip install dispatchai[gguf] # + llama.cpp GGUF backend |
| pip install dispatchai[full] # + everything |
| ``` |
|
|
| ## Verified Models (June 2026) |
|
|
| - β
31 models fully working (0 broken, 0 partial) |
| - π± 24 models phone-verified on Snapdragon 865 |
| - All have correct chat formats documented |
|
|
| ## Top 3 Models |
|
|
| | Model | Size | Phone Speed | Use Case | |
| |-------|------|-------------|----------| |
| | SmolLM2-135M | 101MB | 46.0 t/s | Ultra-fast, budget phones | |
| | Qwen2.5-0.5B-int4 | 469MB | 23.2 t/s | Best balance for mobile | |
| | Llama-3.2-1B-Q4 | 770MB | 5.4 t/s | Best quality under 1GB | |
|
|
| ## About |
|
|
| Dispatch AI (FZE) β Sharjah Free Zone, UAE. License No. 10818. |
|
|
| π [dispatchai.ai](https://www.dispatchai.ai) | π€ [huggingface.co/dispatchAI](https://huggingface.co/dispatchAI) | API: [api.dispatchai.ai](https://api.dispatchai.ai) |
|
|
| *I think, therefore I ship.* |
|
|