Update README.md
Browse files
README.md
CHANGED
|
@@ -81,7 +81,7 @@ The two figures below compare the latency and throughput performance of the Phi-
|
|
| 81 |
<img src="lat.png" width="300"/>
|
| 82 |
<img src="thr_lat.png" width="298"/>
|
| 83 |
</div>
|
| 84 |
-
Figure 1. The first plot shows average inference latency as a function of generation length, while the second plot illustrates how inference latency varies with throughput. Both experiments were conducted using the vLLM inference framework on a single A100-80GB GPU over
|
| 85 |
|
| 86 |
## Usage
|
| 87 |
|
|
|
|
| 81 |
<img src="lat.png" width="300"/>
|
| 82 |
<img src="thr_lat.png" width="298"/>
|
| 83 |
</div>
|
| 84 |
+
Figure 1. The first plot shows average inference latency as a function of generation length, while the second plot illustrates how inference latency varies with throughput. Both experiments were conducted using the vLLM inference framework on a single A100-80GB GPU over varying concurrency levels of user requests.
|
| 85 |
|
| 86 |
## Usage
|
| 87 |
|