ruipeterpan
/

Qwen2.5-7B-Instruct_EAGLE3_UltraChat

Text Generation

Model card Files Files and versions

ruipeterpan commited on Jan 21

Commit

a4de607

·

verified ·

1 Parent(s): 451404a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -59,7 +59,7 @@ python -m sglang.launch_server --model Qwen/Qwen2.5-7B-Instruct \
 ```
-### Performance Evaluation
 We run our evaluations on two NVIDIA A6000-48GB GPUs connected via PCIe 4.0 x16. We conducted an extensive hyperparameter search of `num_speculative_tokens` from 3 to 20. In each entry, we report the best speedup across different speculation lengths. The following table reports the TPT speedup over vanilla decoding.

 ```
+### vLLM Performance Evaluation
 We run our evaluations on two NVIDIA A6000-48GB GPUs connected via PCIe 4.0 x16. We conducted an extensive hyperparameter search of `num_speculative_tokens` from 3 to 20. In each entry, we report the best speedup across different speculation lengths. The following table reports the TPT speedup over vanilla decoding.