3morixd commited on
Commit
2b9cf4a
Β·
verified Β·
1 Parent(s): 46e9ad9

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +44 -40
README.md CHANGED
@@ -2,12 +2,12 @@
2
 
3
  **Small. Mobile. Free. UAE-built.**
4
 
5
- `pip install dispatchai` β€” Run mobile-optimized LLMs on your phone, edge device, or laptop. 39 models, all tested on real Snapdragon hardware, all free.
6
 
7
  ## Quick Start
8
 
9
  ```bash
10
- pip install dispatchai
11
  ```
12
 
13
  ### Chat with a model
@@ -15,18 +15,43 @@ pip install dispatchai
15
  ```python
16
  from dispatchai import load_model
17
 
18
- model = load_model("SmolLM2-135M-Instruct-mobile")
19
  response = model.chat("What is the capital of France?")
20
  print(response)
 
21
  ```
22
 
23
- ### Use GGUF/llama.cpp backend
 
 
24
 
25
  ```python
26
- model = load_model("Llama-3.2-1B-Instruct-Q4-mobile", backend="gguf")
27
- print(model.chat("Write a haiku about the desert."))
 
 
 
 
 
 
 
 
 
 
 
28
  ```
29
 
 
 
 
 
 
 
 
 
 
 
 
30
  ### Find the best model for your phone
31
 
32
  ```python
@@ -34,8 +59,6 @@ from dispatchai import recommend
34
 
35
  rec = recommend(ram_mb=2048, task="chat")
36
  print(f"Best model: {rec['recommended']['name']}")
37
- print(f"Size: {rec['recommended']['size_mb']}MB")
38
- print(f"Speed: {rec['recommended']['speed_tps']} tokens/sec")
39
  ```
40
 
41
  ### List all models
@@ -53,7 +76,7 @@ for m in list_models(task="chat"):
53
  from dispatchai import estimate_latency
54
 
55
  lat = estimate_latency("1B", "Q4_K_M")
56
- print(f"{lat['tokens_per_sec']} tokens/sec on Snapdragon 865")
57
  ```
58
 
59
  ### Calculate cost savings
@@ -71,46 +94,27 @@ print(f"Annual savings: ${result['savings']}")
71
  pip install dispatchai # Core (model catalog, recommendations)
72
  pip install dispatchai[torch] # + transformers/torch backend
73
  pip install dispatchai[gguf] # + llama.cpp GGUF backend
74
- pip install dispatchai[full] # + everything (torch, gguf, sentence-transformers)
75
  ```
76
 
77
- ## Available Models
78
-
79
- | Model | Params | Size | Speed | Task |
80
- |-------|--------|------|-------|------|
81
- | SmolLM2-135M-Instruct-mobile | 135M | 270MB | 25.5 t/s | Chat |
82
- | SmolLM2-360M-Instruct-mobile | 360M | 720MB | 21.0 t/s | Chat |
83
- | Qwen2.5-0.5B-Instruct-mobile-int4 | 500M | 350MB | 20.0 t/s | Chat |
84
- | Llama-3.2-1B-Instruct-Q4-mobile | 1B | 700MB | 18.2 t/s | Chat |
85
- | Llama-3.2-1B-FunctionCall-mobile | 1B | 2.5GB | 12.0 t/s | Function Call |
86
- | Qwen2.5-Coder-1.5B-mobile | 1.5B | 3.0GB | 10.5 t/s | Code |
87
- | Gemma-2B-Arabic-mobile | 2B | 5.0GB | 8.0 t/s | Arabic |
88
- | Llama-3.2-3B-Instruct-Q5-mobile | 3B | 2.1GB | 8.5 t/s | Chat |
89
-
90
- [Browse all 39 models β†’](https://huggingface.co/dispatchAI)
91
-
92
- ## Hardware Targets
93
-
94
- All benchmarks measured on **Snapdragon 865 (Samsung S20 FE, 8GB RAM)** using llama.cpp.
95
-
96
- The `estimate_latency()` function supports:
97
- - Snapdragon 865 (baseline)
98
- - Snapdragon 8 Gen 2 (1.8x)
99
- - Snapdragon 8 Gen 3 (2.2x)
100
- - Apple A17 Pro (2.5x)
101
- - Apple M2 (3.0x)
102
- - Snapdragon 778G mid-range (0.7x)
103
 
104
- ## The Thesis
 
 
105
 
106
- > *The best model is the one that runs.*
107
 
108
- We're building the AI layer for a billion phones that can't afford cloud inference. Every model is free, open-source, and tested on real hardware.
 
 
 
 
109
 
110
  ## About
111
 
112
  Dispatch AI (FZE) β€” Sharjah Free Zone, UAE. License No. 10818.
113
 
114
- 🌐 [dispatchai.ai](https://www.dispatchai.ai) | πŸ€— [huggingface.co/dispatchAI](https://huggingface.co/dispatchAI) | 𝕏 [@DispatchAIdev](https://twitter.com/DispatchAIdev)
115
 
116
  *I think, therefore I ship.*
 
2
 
3
  **Small. Mobile. Free. UAE-built.**
4
 
5
+ `pip install dispatchai` β€” Run mobile-optimized LLMs on your phone, edge device, or laptop. 31 verified models, all tested on real Snapdragon hardware, all free.
6
 
7
  ## Quick Start
8
 
9
  ```bash
10
+ pip install dispatchai[gguf]
11
  ```
12
 
13
  ### Chat with a model
 
15
  ```python
16
  from dispatchai import load_model
17
 
18
+ model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
19
  response = model.chat("What is the capital of France?")
20
  print(response)
21
+ # β†’ "The capital of France is Paris."
22
  ```
23
 
24
+ ## 🌐 Inference API
25
+
26
+ Use dispatchAI models via REST API (OpenAI-compatible):
27
 
28
  ```python
29
+ import openai
30
+
31
+ client = openai.OpenAI(
32
+ base_url="https://api.dispatchai.ai/v1",
33
+ api_key="da-demo-key-0001"
34
+ )
35
+
36
+ response = client.chat.completions.create(
37
+ model="dispatchAI/SmolLM2-135M-Instruct-mobile",
38
+ messages=[{"role": "user", "content": "What is the capital of France?"}]
39
+ )
40
+ print(response.choices[0].message.content)
41
+ # β†’ "The capital of France is Paris."
42
  ```
43
 
44
+ **Pricing:** $0.001/1K input tokens, $0.002/1K output tokens (10x cheaper than OpenAI)
45
+
46
+ **Endpoint:** `https://api.dispatchai.ai/v1`
47
+
48
+ **Available Models:**
49
+ - dispatchAI/SmolLM2-135M-Instruct-mobile (101MB, 46 t/s on phone)
50
+ - dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 (469MB, 23 t/s on phone)
51
+ - dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile (770MB, 5.4 t/s on phone)
52
+
53
+ ## Local Inference
54
+
55
  ### Find the best model for your phone
56
 
57
  ```python
 
59
 
60
  rec = recommend(ram_mb=2048, task="chat")
61
  print(f"Best model: {rec['recommended']['name']}")
 
 
62
  ```
63
 
64
  ### List all models
 
76
  from dispatchai import estimate_latency
77
 
78
  lat = estimate_latency("1B", "Q4_K_M")
79
+ print(f"{lat['tokens_per_sec']} t/s on Snapdragon 865")
80
  ```
81
 
82
  ### Calculate cost savings
 
94
  pip install dispatchai # Core (model catalog, recommendations)
95
  pip install dispatchai[torch] # + transformers/torch backend
96
  pip install dispatchai[gguf] # + llama.cpp GGUF backend
97
+ pip install dispatchai[full] # + everything
98
  ```
99
 
100
+ ## Verified Models (June 2026)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
+ - βœ… 31 models fully working (0 broken, 0 partial)
103
+ - πŸ“± 24 models phone-verified on Snapdragon 865
104
+ - All have correct chat formats documented
105
 
106
+ ## Top 3 Models
107
 
108
+ | Model | Size | Phone Speed | Use Case |
109
+ |-------|------|-------------|----------|
110
+ | SmolLM2-135M | 101MB | 46.0 t/s | Ultra-fast, budget phones |
111
+ | Qwen2.5-0.5B-int4 | 469MB | 23.2 t/s | Best balance for mobile |
112
+ | Llama-3.2-1B-Q4 | 770MB | 5.4 t/s | Best quality under 1GB |
113
 
114
  ## About
115
 
116
  Dispatch AI (FZE) β€” Sharjah Free Zone, UAE. License No. 10818.
117
 
118
+ 🌐 [dispatchai.ai](https://www.dispatchai.ai) | πŸ€— [huggingface.co/dispatchAI](https://huggingface.co/dispatchAI) | API: [api.dispatchai.ai](https://api.dispatchai.ai)
119
 
120
  *I think, therefore I ship.*