Update README.md
Browse files
README.md
CHANGED
|
@@ -160,7 +160,6 @@ The model was evaluated on OpenLLM Leaderboard [V1](https://huggingface.co/space
|
|
| 160 |
<summary>Evaluation Commands</summary>
|
| 161 |
|
| 162 |
```
|
| 163 |
-
guidellm --model neuralmagic/granite-3.1-8b-instruct-quantized.w4a16 --target "http://localhost:8000/v1" --data-type emulated --data "prompt_tokens=<prompt_tokens>,generated_tokens=<generated_tokens>" --max seconds 360 --backend aiohttp_server
|
| 164 |
```
|
| 165 |
|
| 166 |
</details>
|
|
@@ -175,6 +174,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 175 |
|
| 176 |
<details>
|
| 177 |
<summary>Benchmarking Command</summary>
|
|
|
|
| 178 |
|
| 179 |
</details>
|
| 180 |
|
|
|
|
| 160 |
<summary>Evaluation Commands</summary>
|
| 161 |
|
| 162 |
```
|
|
|
|
| 163 |
```
|
| 164 |
|
| 165 |
</details>
|
|
|
|
| 174 |
|
| 175 |
<details>
|
| 176 |
<summary>Benchmarking Command</summary>
|
| 177 |
+
guidellm --model neuralmagic/pixtral-12b-quantized.w4a16 --target "http://localhost:8000/v1" --data-type emulated --data prompt_tokens=128,generated_tokens=128,images=1,width=640,height=480 --max seconds 120 --backend aiohttp_server
|
| 178 |
|
| 179 |
</details>
|
| 180 |
|