3morixd commited on
Commit
dea9ba1
·
verified ·
1 Parent(s): 394694a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model: openbmb/MiniCPM5-1B
4
  tags:
 
5
  - dispatch-ai
6
  - mobile
7
  - quantized
@@ -46,3 +47,38 @@ Tested on **Samsung Galaxy S20 FE 5G** (Snapdragon 865, 8GB RAM):
46
  ```
47
 
48
  🌐 [dispatchAI on HuggingFace](https://huggingface.co/dispatchAI)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  base_model: openbmb/MiniCPM5-1B
4
  tags:
5
+ - speculative-decoding-draft
6
  - dispatch-ai
7
  - mobile
8
  - quantized
 
47
  ```
48
 
49
  🌐 [dispatchAI on HuggingFace](https://huggingface.co/dispatchAI)
50
+
51
+
52
+ ## Speculative Decoding Draft Model
53
+
54
+ This model is optimized for use as a **draft model** in speculative decoding setups.
55
+
56
+ ### What is speculative decoding?
57
+ Speculative decoding pairs a small, fast "draft" model with a larger "target" model.
58
+ The draft model proposes tokens that the target model verifies in parallel, achieving
59
+ 2-3x speedup with zero quality loss.
60
+
61
+ ### Why this model?
62
+ - **Small and fast**: Sub-1B parameters = minimal draft overhead
63
+ - **Mobile-optimized**: Already quantized and pruned for edge deployment
64
+ - **Same family**: Pairs naturally with larger models of the same architecture
65
+
66
+ ### Usage with vLLM
67
+ ```python
68
+ from vllm import LLM, SamplingParams
69
+
70
+ llm = LLM(
71
+ model="target-model-7b",
72
+ speculative_model="dispatchAI/MiniCPM5-1B-mobile",
73
+ num_speculative_tokens=5,
74
+ )
75
+ ```
76
+
77
+ ### Usage with transformers
78
+ ```python
79
+ from transformers import AutoModelForCausalLM, AutoTokenizer
80
+
81
+ target = AutoModelForCausalLM.from_pretrained("target-model-7b")
82
+ draft = AutoModelForCausalLM.from_pretrained("dispatchAI/MiniCPM5-1B-mobile")
83
+ # See transformers docs for assisted_generation
84
+ ```