Added vllm run commands
Browse filesAdded an example of using vllm in a basic configuration as well as the advanced configuration with ngram spec dec.
README.md
CHANGED
|
@@ -22,6 +22,24 @@ The model has been fine-tuned using the [zeta dataset](https://huggingface.co/da
|
|
| 22 |
The dataset used for training is available at:
|
| 23 |
[zed-industries/zeta](https://huggingface.co/datasets/zed-industries/zeta)
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## Learn More
|
| 26 |
|
| 27 |
For more insights about the model and its integration in Zed, check out the official blog post:
|
|
|
|
| 22 |
The dataset used for training is available at:
|
| 23 |
[zed-industries/zeta](https://huggingface.co/datasets/zed-industries/zeta)
|
| 24 |
|
| 25 |
+
## Running Zeta
|
| 26 |
+
|
| 27 |
+
### vLLM - Simple
|
| 28 |
+
|
| 29 |
+
`vllm serve zed-industries/zeta --served-model-name zeta`
|
| 30 |
+
|
| 31 |
+
### vLLM - Advanced
|
| 32 |
+
|
| 33 |
+
- [Quantization](https://docs.vllm.ai/en/latest/features/quantization/fp8.html#) vLLM supports FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs such as Nvidia H100 and AMD MI300x.
|
| 34 |
+
|
| 35 |
+
- [NGram Speculative Decoding](https://docs.vllm.ai/en/latest/features/spec_decode.html#speculating-by-matching-n-grams-in-the-prompt) configures vLLM to use
|
| 36 |
+
speculative decoding where proposals are generated by matching n-grams in the prompt. This is a great fit for edit predictions since many of the tokens are already present in the prompt and
|
| 37 |
+
the model is only needed to generate changes to the code file.
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
`vllm serve zed-industries/zeta --served-model-name zeta --enable-prefix-caching --enable-chunked-prefill --quantization="fp8" --speculative-model [ngram] --ngram-prompt-lookup-max 4 --ngram-prompt-lookup-min 2 --num-speculative-tokens 8`
|
| 41 |
+
|
| 42 |
+
|
| 43 |
## Learn More
|
| 44 |
|
| 45 |
For more insights about the model and its integration in Zed, check out the official blog post:
|