3morixd commited on
Commit
1f32acc
·
verified ·
1 Parent(s): dea9ba1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +9 -76
README.md CHANGED
@@ -1,84 +1,17 @@
1
  ---
2
  license: apache-2.0
3
- base_model: openbmb/MiniCPM5-1B
4
- tags:
5
- - speculative-decoding-draft
6
- - dispatch-ai
7
- - mobile
8
- - quantized
9
- - gguf
10
- - phone-farm-tested
11
  pipeline_tag: text-generation
12
- language:
13
- - en
14
  ---
15
 
16
- # MiniCPM5-1B-mobile
17
 
18
- **Dispatch AI** — Built for mobile. Tested on real phones.
19
 
20
- ## Category
 
 
 
21
 
22
- Text GenerationOpenBMB's multilingual model
23
-
24
- ## Model
25
-
26
- Re-engineered from [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B).
27
- Quantized to Q4_K_M GGUF for on-device inference via llama.cpp.
28
- Size: 656 MB.
29
-
30
- ## Phone Farm Test Results
31
-
32
- Tested on **Samsung Galaxy S20 FE 5G** (Snapdragon 865, 8GB RAM):
33
-
34
- | Phone | Gen t/s | Prompt t/s |
35
- |-------|---------|------------|
36
- | R3CN30WHS2Z | 21.7 | 67.5 |
37
- | R3CN509PLHA | 22.3 | 95.3 |
38
-
39
- - **Average: 22.0 t/s**
40
- - **40-phone aggregate: ~880 t/s**
41
-
42
-
43
- ## Usage
44
-
45
- ```bash
46
- ./llama-cli -m model.gguf -p "Hello" -n 100 -t 4 -c 512
47
- ```
48
-
49
- 🌐 [dispatchAI on HuggingFace](https://huggingface.co/dispatchAI)
50
-
51
-
52
- ## Speculative Decoding Draft Model
53
-
54
- This model is optimized for use as a **draft model** in speculative decoding setups.
55
-
56
- ### What is speculative decoding?
57
- Speculative decoding pairs a small, fast "draft" model with a larger "target" model.
58
- The draft model proposes tokens that the target model verifies in parallel, achieving
59
- 2-3x speedup with zero quality loss.
60
-
61
- ### Why this model?
62
- - **Small and fast**: Sub-1B parameters = minimal draft overhead
63
- - **Mobile-optimized**: Already quantized and pruned for edge deployment
64
- - **Same family**: Pairs naturally with larger models of the same architecture
65
-
66
- ### Usage with vLLM
67
- ```python
68
- from vllm import LLM, SamplingParams
69
-
70
- llm = LLM(
71
- model="target-model-7b",
72
- speculative_model="dispatchAI/MiniCPM5-1B-mobile",
73
- num_speculative_tokens=5,
74
- )
75
- ```
76
-
77
- ### Usage with transformers
78
- ```python
79
- from transformers import AutoModelForCausalLM, AutoTokenizer
80
-
81
- target = AutoModelForCausalLM.from_pretrained("target-model-7b")
82
- draft = AutoModelForCausalLM.from_pretrained("dispatchAI/MiniCPM5-1B-mobile")
83
- # See transformers docs for assisted_generation
84
- ```
 
1
  ---
2
  license: apache-2.0
3
+ tags: [dispatch-ai, mobile, quantized, gguf, phone-farm-tested, on-device, edge-ai]
 
 
 
 
 
 
 
4
  pipeline_tag: text-generation
5
+ language: [en]
 
6
  ---
7
 
8
+ # MiniCPM5 1B — Mobile Optimized
9
 
10
+ **Dispatch AI** — Compact model from ModelBest, tuned for mobile.
11
 
12
+ | Metric | Value |
13
+ |--------|-------|
14
+ | Generation | 17.5 t/s |
15
+ | Size | ~750 MB |
16
 
17
+ **Dispatch AI (FZE)** Sharjah, UAE | License 10818