Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -210,10 +210,6 @@ and decode tokens per second will be more important than time to first token.
 Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
 Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
-## Download dataset
-Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
-Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
 ## benchmark_latency
 Need to install vllm nightly to get some recent changes
@@ -242,8 +238,15 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
 We also benchmarked the throughput in a serving environment.
-Run the following under `vllm` source code root folder:
 ### baseline
 Server:

 Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
 Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
 ## benchmark_latency
 Need to install vllm nightly to get some recent changes
 We also benchmarked the throughput in a serving environment.
+Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
+Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
+Get vllm source code:
+```
+git clone git@github.com:vllm-project/vllm.git
+```
+Run the following under `vllm` root folder:
 ### baseline
 Server: