dispatchAI
/

dispatchAI-SDK

Model card Files Files and versions

dispatchAI-SDK / README.md

3morixd's picture

Upload folder using huggingface_hub

2b9cf4a verified about 8 hours ago

|

History Blame Contribute Delete

3.19 kB

	# dispatchAI SDK

	Small. Mobile. Free. UAE-built.

	`pip install dispatchai` — Run mobile-optimized LLMs on your phone, edge device, or laptop. 31 verified models, all tested on real Snapdragon hardware, all free.

	## Quick Start

	```bash
	pip install dispatchai[gguf]
	```

	### Chat with a model

	```python
	from dispatchai import load_model

	model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
	response = model.chat("What is the capital of France?")
	print(response)
	# → "The capital of France is Paris."
	```

	## 🌐 Inference API

	Use dispatchAI models via REST API (OpenAI-compatible):

	```python
	import openai

	client = openai.OpenAI(
	base_url="https://api.dispatchai.ai/v1",
	api_key="da-demo-key-0001"
	)

	response = client.chat.completions.create(
	model="dispatchAI/SmolLM2-135M-Instruct-mobile",
	messages=[{"role": "user", "content": "What is the capital of France?"}]
	)
	print(response.choices[0].message.content)
	# → "The capital of France is Paris."
	```

	Pricing: $0.001/1K input tokens, $0.002/1K output tokens (10x cheaper than OpenAI)

	Endpoint: `https://api.dispatchai.ai/v1`

	Available Models:
	- dispatchAI/SmolLM2-135M-Instruct-mobile (101MB, 46 t/s on phone)
	- dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 (469MB, 23 t/s on phone)
	- dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile (770MB, 5.4 t/s on phone)

	## Local Inference

	### Find the best model for your phone

	```python
	from dispatchai import recommend

	rec = recommend(ram_mb=2048, task="chat")
	print(f"Best model: {rec['recommended']['name']}")
	```

	### List all models

	```python
	from dispatchai import list_models

	for m in list_models(task="chat"):
	print(f" {m['name']}: {m['size_mb']}MB, {m['speed_tps']} t/s")
	```

	### Estimate latency

	```python
	from dispatchai import estimate_latency

	lat = estimate_latency("1B", "Q4_K_M")
	print(f"{lat['tokens_per_sec']} t/s on Snapdragon 865")
	```

	### Calculate cost savings

	```python
	from dispatchai import calculate_cost

	result = calculate_cost(daily_queries=10000, cloud_cost_per_1k=0.50)
	print(f"Annual savings: ${result['savings']}")
	```

	## Installation Options

	```bash
	pip install dispatchai # Core (model catalog, recommendations)
	pip install dispatchai[torch] # + transformers/torch backend
	pip install dispatchai[gguf] # + llama.cpp GGUF backend
	pip install dispatchai[full] # + everything
	```

	## Verified Models (June 2026)

	- ✅ 31 models fully working (0 broken, 0 partial)
	- 📱 24 models phone-verified on Snapdragon 865
	- All have correct chat formats documented

	## Top 3 Models

	\| Model \| Size \| Phone Speed \| Use Case \|
	\|-------\|------\|-------------\|----------\|
	\| SmolLM2-135M \| 101MB \| 46.0 t/s \| Ultra-fast, budget phones \|
	\| Qwen2.5-0.5B-int4 \| 469MB \| 23.2 t/s \| Best balance for mobile \|
	\| Llama-3.2-1B-Q4 \| 770MB \| 5.4 t/s \| Best quality under 1GB \|

	## About

	Dispatch AI (FZE) — Sharjah Free Zone, UAE. License No. 10818.

	🌐 [dispatchai.ai](https://www.dispatchai.ai) \| 🤗 [huggingface.co/dispatchAI](https://huggingface.co/dispatchAI) \| API: [api.dispatchai.ai](https://api.dispatchai.ai)

	I think, therefore I ship.