Update README.md
Browse files
README.md
CHANGED
|
@@ -311,11 +311,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 311 |
<th>Model</th>
|
| 312 |
<th>Average Cost Reduction</th>
|
| 313 |
<th>Latency (s)</th>
|
| 314 |
-
<th>
|
| 315 |
<th>Latency (s)th>
|
| 316 |
-
<th>
|
| 317 |
<th>Latency (s)</th>
|
| 318 |
-
<th>
|
| 319 |
</tr>
|
| 320 |
</thead>
|
| 321 |
<tbody style="text-align: center">
|
|
@@ -415,7 +415,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 415 |
</tbody>
|
| 416 |
</table>
|
| 417 |
|
|
|
|
| 418 |
|
|
|
|
| 419 |
|
| 420 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
| 421 |
|
|
@@ -434,11 +436,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 434 |
<th>Model</th>
|
| 435 |
<th>Average Cost Reduction</th>
|
| 436 |
<th>Maximum throughput (QPS)</th>
|
| 437 |
-
<th>
|
| 438 |
<th>Maximum throughput (QPS)</th>
|
| 439 |
-
<th>
|
| 440 |
<th>Maximum throughput (QPS)</th>
|
| 441 |
-
<th>
|
| 442 |
</tr>
|
| 443 |
</thead>
|
| 444 |
<tbody style="text-align: center">
|
|
@@ -537,3 +539,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 537 |
</tr>
|
| 538 |
</tbody>
|
| 539 |
</table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 311 |
<th>Model</th>
|
| 312 |
<th>Average Cost Reduction</th>
|
| 313 |
<th>Latency (s)</th>
|
| 314 |
+
<th>Queries Per Dollar</th>
|
| 315 |
<th>Latency (s)th>
|
| 316 |
+
<th>Queries Per Dollar</th>
|
| 317 |
<th>Latency (s)</th>
|
| 318 |
+
<th>Queries Per Dollar</th>
|
| 319 |
</tr>
|
| 320 |
</thead>
|
| 321 |
<tbody style="text-align: center">
|
|
|
|
| 415 |
</tbody>
|
| 416 |
</table>
|
| 417 |
|
| 418 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
| 419 |
|
| 420 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|
| 421 |
|
| 422 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
| 423 |
|
|
|
|
| 436 |
<th>Model</th>
|
| 437 |
<th>Average Cost Reduction</th>
|
| 438 |
<th>Maximum throughput (QPS)</th>
|
| 439 |
+
<th>Queries Per Dollar</th>
|
| 440 |
<th>Maximum throughput (QPS)</th>
|
| 441 |
+
<th>Queries Per Dollar</th>
|
| 442 |
<th>Maximum throughput (QPS)</th>
|
| 443 |
+
<th>Queries Per Dollar</th>
|
| 444 |
</tr>
|
| 445 |
</thead>
|
| 446 |
<tbody style="text-align: center">
|
|
|
|
| 539 |
</tr>
|
| 540 |
</tbody>
|
| 541 |
</table>
|
| 542 |
+
|
| 543 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
| 544 |
+
|
| 545 |
+
**QPS: Queries per second.
|
| 546 |
+
|
| 547 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|