YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

dispatchAI SDK

Small. Mobile. Free. UAE-built.

pip install dispatchai β€” Run mobile-optimized LLMs on your phone, edge device, or laptop. 39 models, all tested on real Snapdragon hardware, all free.

Quick Start

pip install dispatchai

Chat with a model

from dispatchai import load_model

model = load_model("SmolLM2-135M-Instruct-mobile")
response = model.chat("What is the capital of France?")
print(response)

Use GGUF/llama.cpp backend

model = load_model("Llama-3.2-1B-Instruct-Q4-mobile", backend="gguf")
print(model.chat("Write a haiku about the desert."))

Find the best model for your phone

from dispatchai import recommend

rec = recommend(ram_mb=2048, task="chat")
print(f"Best model: {rec['recommended']['name']}")
print(f"Size: {rec['recommended']['size_mb']}MB")
print(f"Speed: {rec['recommended']['speed_tps']} tokens/sec")

List all models

from dispatchai import list_models

for m in list_models(task="chat"):
    print(f"  {m['name']}: {m['size_mb']}MB, {m['speed_tps']} t/s")

Estimate latency

from dispatchai import estimate_latency

lat = estimate_latency("1B", "Q4_K_M")
print(f"{lat['tokens_per_sec']} tokens/sec on Snapdragon 865")

Calculate cost savings

from dispatchai import calculate_cost

result = calculate_cost(daily_queries=10000, cloud_cost_per_1k=0.50)
print(f"Annual savings: ${result['savings']}")

Installation Options

pip install dispatchai                    # Core (model catalog, recommendations)
pip install dispatchai[torch]             # + transformers/torch backend
pip install dispatchai[gguf]              # + llama.cpp GGUF backend
pip install dispatchai[full]              # + everything (torch, gguf, sentence-transformers)

Available Models

Model Params Size Speed Task
SmolLM2-135M-Instruct-mobile 135M 270MB 25.5 t/s Chat
SmolLM2-360M-Instruct-mobile 360M 720MB 21.0 t/s Chat
Qwen2.5-0.5B-Instruct-mobile-int4 500M 350MB 20.0 t/s Chat
Llama-3.2-1B-Instruct-Q4-mobile 1B 700MB 18.2 t/s Chat
Llama-3.2-1B-FunctionCall-mobile 1B 2.5GB 12.0 t/s Function Call
Qwen2.5-Coder-1.5B-mobile 1.5B 3.0GB 10.5 t/s Code
Gemma-2B-Arabic-mobile 2B 5.0GB 8.0 t/s Arabic
Llama-3.2-3B-Instruct-Q5-mobile 3B 2.1GB 8.5 t/s Chat

Browse all 39 models β†’

Hardware Targets

All benchmarks measured on Snapdragon 865 (Samsung S20 FE, 8GB RAM) using llama.cpp.

The estimate_latency() function supports:

  • Snapdragon 865 (baseline)
  • Snapdragon 8 Gen 2 (1.8x)
  • Snapdragon 8 Gen 3 (2.2x)
  • Apple A17 Pro (2.5x)
  • Apple M2 (3.0x)
  • Snapdragon 778G mid-range (0.7x)

The Thesis

The best model is the one that runs.

We're building the AI layer for a billion phones that can't afford cloud inference. Every model is free, open-source, and tested on real hardware.

About

Dispatch AI (FZE) β€” Sharjah Free Zone, UAE. License No. 10818.

🌐 dispatchai.ai | πŸ€— huggingface.co/dispatchAI | 𝕏 @DispatchAIdev

I think, therefore I ship.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using dispatchAI/dispatchAI-SDK 2