Update README.md
Browse files
README.md
CHANGED
|
@@ -59,7 +59,7 @@ python -m sglang.launch_server --model Qwen/Qwen2.5-7B-Instruct \
|
|
| 59 |
```
|
| 60 |
|
| 61 |
|
| 62 |
-
### Performance Evaluation
|
| 63 |
|
| 64 |
We run our evaluations on two NVIDIA A6000-48GB GPUs connected via PCIe 4.0 x16. We conducted an extensive hyperparameter search of `num_speculative_tokens` from 3 to 20. In each entry, we report the best speedup across different speculation lengths. The following table reports the TPT speedup over vanilla decoding.
|
| 65 |
|
|
|
|
| 59 |
```
|
| 60 |
|
| 61 |
|
| 62 |
+
### vLLM Performance Evaluation
|
| 63 |
|
| 64 |
We run our evaluations on two NVIDIA A6000-48GB GPUs connected via PCIe 4.0 x16. We conducted an extensive hyperparameter search of `num_speculative_tokens` from 3 to 20. In each entry, we report the best speedup across different speculation lengths. The following table reports the TPT speedup over vanilla decoding.
|
| 65 |
|