Update README.md
Browse files
README.md
CHANGED
|
@@ -210,10 +210,6 @@ and decode tokens per second will be more important than time to first token.
|
|
| 210 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
| 211 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
| 212 |
|
| 213 |
-
## Download dataset
|
| 214 |
-
Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
|
| 215 |
-
|
| 216 |
-
Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
|
| 217 |
## benchmark_latency
|
| 218 |
|
| 219 |
Need to install vllm nightly to get some recent changes
|
|
@@ -242,8 +238,15 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
|
| 242 |
|
| 243 |
We also benchmarked the throughput in a serving environment.
|
| 244 |
|
|
|
|
|
|
|
| 245 |
|
| 246 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 247 |
|
| 248 |
### baseline
|
| 249 |
Server:
|
|
|
|
| 210 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
| 211 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
| 212 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 213 |
## benchmark_latency
|
| 214 |
|
| 215 |
Need to install vllm nightly to get some recent changes
|
|
|
|
| 238 |
|
| 239 |
We also benchmarked the throughput in a serving environment.
|
| 240 |
|
| 241 |
+
Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
|
| 242 |
+
Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
|
| 243 |
|
| 244 |
+
Get vllm source code:
|
| 245 |
+
```
|
| 246 |
+
git clone git@github.com:vllm-project/vllm.git
|
| 247 |
+
```
|
| 248 |
+
|
| 249 |
+
Run the following under `vllm` root folder:
|
| 250 |
|
| 251 |
### baseline
|
| 252 |
Server:
|