Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -2,12 +2,12 @@
|
|
| 2 |
|
| 3 |
**Small. Mobile. Free. UAE-built.**
|
| 4 |
|
| 5 |
-
`pip install dispatchai` β Run mobile-optimized LLMs on your phone, edge device, or laptop.
|
| 6 |
|
| 7 |
## Quick Start
|
| 8 |
|
| 9 |
```bash
|
| 10 |
-
pip install dispatchai
|
| 11 |
```
|
| 12 |
|
| 13 |
### Chat with a model
|
|
@@ -15,18 +15,43 @@ pip install dispatchai
|
|
| 15 |
```python
|
| 16 |
from dispatchai import load_model
|
| 17 |
|
| 18 |
-
model = load_model("SmolLM2-135M-Instruct-mobile")
|
| 19 |
response = model.chat("What is the capital of France?")
|
| 20 |
print(response)
|
|
|
|
| 21 |
```
|
| 22 |
|
| 23 |
-
##
|
|
|
|
|
|
|
| 24 |
|
| 25 |
```python
|
| 26 |
-
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
```
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
### Find the best model for your phone
|
| 31 |
|
| 32 |
```python
|
|
@@ -34,8 +59,6 @@ from dispatchai import recommend
|
|
| 34 |
|
| 35 |
rec = recommend(ram_mb=2048, task="chat")
|
| 36 |
print(f"Best model: {rec['recommended']['name']}")
|
| 37 |
-
print(f"Size: {rec['recommended']['size_mb']}MB")
|
| 38 |
-
print(f"Speed: {rec['recommended']['speed_tps']} tokens/sec")
|
| 39 |
```
|
| 40 |
|
| 41 |
### List all models
|
|
@@ -53,7 +76,7 @@ for m in list_models(task="chat"):
|
|
| 53 |
from dispatchai import estimate_latency
|
| 54 |
|
| 55 |
lat = estimate_latency("1B", "Q4_K_M")
|
| 56 |
-
print(f"{lat['tokens_per_sec']}
|
| 57 |
```
|
| 58 |
|
| 59 |
### Calculate cost savings
|
|
@@ -71,46 +94,27 @@ print(f"Annual savings: ${result['savings']}")
|
|
| 71 |
pip install dispatchai # Core (model catalog, recommendations)
|
| 72 |
pip install dispatchai[torch] # + transformers/torch backend
|
| 73 |
pip install dispatchai[gguf] # + llama.cpp GGUF backend
|
| 74 |
-
pip install dispatchai[full] # + everything
|
| 75 |
```
|
| 76 |
|
| 77 |
-
##
|
| 78 |
-
|
| 79 |
-
| Model | Params | Size | Speed | Task |
|
| 80 |
-
|-------|--------|------|-------|------|
|
| 81 |
-
| SmolLM2-135M-Instruct-mobile | 135M | 270MB | 25.5 t/s | Chat |
|
| 82 |
-
| SmolLM2-360M-Instruct-mobile | 360M | 720MB | 21.0 t/s | Chat |
|
| 83 |
-
| Qwen2.5-0.5B-Instruct-mobile-int4 | 500M | 350MB | 20.0 t/s | Chat |
|
| 84 |
-
| Llama-3.2-1B-Instruct-Q4-mobile | 1B | 700MB | 18.2 t/s | Chat |
|
| 85 |
-
| Llama-3.2-1B-FunctionCall-mobile | 1B | 2.5GB | 12.0 t/s | Function Call |
|
| 86 |
-
| Qwen2.5-Coder-1.5B-mobile | 1.5B | 3.0GB | 10.5 t/s | Code |
|
| 87 |
-
| Gemma-2B-Arabic-mobile | 2B | 5.0GB | 8.0 t/s | Arabic |
|
| 88 |
-
| Llama-3.2-3B-Instruct-Q5-mobile | 3B | 2.1GB | 8.5 t/s | Chat |
|
| 89 |
-
|
| 90 |
-
[Browse all 39 models β](https://huggingface.co/dispatchAI)
|
| 91 |
-
|
| 92 |
-
## Hardware Targets
|
| 93 |
-
|
| 94 |
-
All benchmarks measured on **Snapdragon 865 (Samsung S20 FE, 8GB RAM)** using llama.cpp.
|
| 95 |
-
|
| 96 |
-
The `estimate_latency()` function supports:
|
| 97 |
-
- Snapdragon 865 (baseline)
|
| 98 |
-
- Snapdragon 8 Gen 2 (1.8x)
|
| 99 |
-
- Snapdragon 8 Gen 3 (2.2x)
|
| 100 |
-
- Apple A17 Pro (2.5x)
|
| 101 |
-
- Apple M2 (3.0x)
|
| 102 |
-
- Snapdragon 778G mid-range (0.7x)
|
| 103 |
|
| 104 |
-
|
|
|
|
|
|
|
| 105 |
|
| 106 |
-
|
| 107 |
|
| 108 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
## About
|
| 111 |
|
| 112 |
Dispatch AI (FZE) β Sharjah Free Zone, UAE. License No. 10818.
|
| 113 |
|
| 114 |
-
π [dispatchai.ai](https://www.dispatchai.ai) | π€ [huggingface.co/dispatchAI](https://huggingface.co/dispatchAI) |
|
| 115 |
|
| 116 |
*I think, therefore I ship.*
|
|
|
|
| 2 |
|
| 3 |
**Small. Mobile. Free. UAE-built.**
|
| 4 |
|
| 5 |
+
`pip install dispatchai` β Run mobile-optimized LLMs on your phone, edge device, or laptop. 31 verified models, all tested on real Snapdragon hardware, all free.
|
| 6 |
|
| 7 |
## Quick Start
|
| 8 |
|
| 9 |
```bash
|
| 10 |
+
pip install dispatchai[gguf]
|
| 11 |
```
|
| 12 |
|
| 13 |
### Chat with a model
|
|
|
|
| 15 |
```python
|
| 16 |
from dispatchai import load_model
|
| 17 |
|
| 18 |
+
model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
|
| 19 |
response = model.chat("What is the capital of France?")
|
| 20 |
print(response)
|
| 21 |
+
# β "The capital of France is Paris."
|
| 22 |
```
|
| 23 |
|
| 24 |
+
## π Inference API
|
| 25 |
+
|
| 26 |
+
Use dispatchAI models via REST API (OpenAI-compatible):
|
| 27 |
|
| 28 |
```python
|
| 29 |
+
import openai
|
| 30 |
+
|
| 31 |
+
client = openai.OpenAI(
|
| 32 |
+
base_url="https://api.dispatchai.ai/v1",
|
| 33 |
+
api_key="da-demo-key-0001"
|
| 34 |
+
)
|
| 35 |
+
|
| 36 |
+
response = client.chat.completions.create(
|
| 37 |
+
model="dispatchAI/SmolLM2-135M-Instruct-mobile",
|
| 38 |
+
messages=[{"role": "user", "content": "What is the capital of France?"}]
|
| 39 |
+
)
|
| 40 |
+
print(response.choices[0].message.content)
|
| 41 |
+
# β "The capital of France is Paris."
|
| 42 |
```
|
| 43 |
|
| 44 |
+
**Pricing:** $0.001/1K input tokens, $0.002/1K output tokens (10x cheaper than OpenAI)
|
| 45 |
+
|
| 46 |
+
**Endpoint:** `https://api.dispatchai.ai/v1`
|
| 47 |
+
|
| 48 |
+
**Available Models:**
|
| 49 |
+
- dispatchAI/SmolLM2-135M-Instruct-mobile (101MB, 46 t/s on phone)
|
| 50 |
+
- dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 (469MB, 23 t/s on phone)
|
| 51 |
+
- dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile (770MB, 5.4 t/s on phone)
|
| 52 |
+
|
| 53 |
+
## Local Inference
|
| 54 |
+
|
| 55 |
### Find the best model for your phone
|
| 56 |
|
| 57 |
```python
|
|
|
|
| 59 |
|
| 60 |
rec = recommend(ram_mb=2048, task="chat")
|
| 61 |
print(f"Best model: {rec['recommended']['name']}")
|
|
|
|
|
|
|
| 62 |
```
|
| 63 |
|
| 64 |
### List all models
|
|
|
|
| 76 |
from dispatchai import estimate_latency
|
| 77 |
|
| 78 |
lat = estimate_latency("1B", "Q4_K_M")
|
| 79 |
+
print(f"{lat['tokens_per_sec']} t/s on Snapdragon 865")
|
| 80 |
```
|
| 81 |
|
| 82 |
### Calculate cost savings
|
|
|
|
| 94 |
pip install dispatchai # Core (model catalog, recommendations)
|
| 95 |
pip install dispatchai[torch] # + transformers/torch backend
|
| 96 |
pip install dispatchai[gguf] # + llama.cpp GGUF backend
|
| 97 |
+
pip install dispatchai[full] # + everything
|
| 98 |
```
|
| 99 |
|
| 100 |
+
## Verified Models (June 2026)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
+
- β
31 models fully working (0 broken, 0 partial)
|
| 103 |
+
- π± 24 models phone-verified on Snapdragon 865
|
| 104 |
+
- All have correct chat formats documented
|
| 105 |
|
| 106 |
+
## Top 3 Models
|
| 107 |
|
| 108 |
+
| Model | Size | Phone Speed | Use Case |
|
| 109 |
+
|-------|------|-------------|----------|
|
| 110 |
+
| SmolLM2-135M | 101MB | 46.0 t/s | Ultra-fast, budget phones |
|
| 111 |
+
| Qwen2.5-0.5B-int4 | 469MB | 23.2 t/s | Best balance for mobile |
|
| 112 |
+
| Llama-3.2-1B-Q4 | 770MB | 5.4 t/s | Best quality under 1GB |
|
| 113 |
|
| 114 |
## About
|
| 115 |
|
| 116 |
Dispatch AI (FZE) β Sharjah Free Zone, UAE. License No. 10818.
|
| 117 |
|
| 118 |
+
π [dispatchai.ai](https://www.dispatchai.ai) | π€ [huggingface.co/dispatchAI](https://huggingface.co/dispatchAI) | API: [api.dispatchai.ai](https://api.dispatchai.ai)
|
| 119 |
|
| 120 |
*I think, therefore I ship.*
|