fix vllm parameter's name
#6
by sebag90 - opened
README.md
CHANGED
|
@@ -163,8 +163,8 @@ VLLM_DISABLE_COMPILE_CACHE=1 vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602
|
|
| 163 |
|
| 164 |
Additional flags:
|
| 165 |
* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
|
| 166 |
-
* You can reduce the default `--max-model-
|
| 167 |
-
if you are certain that you won't have to transcribe for more than X hours. By default the model uses a `--max-model-
|
| 168 |
|
| 169 |
#### Usage of the model
|
| 170 |
|
|
|
|
| 163 |
|
| 164 |
Additional flags:
|
| 165 |
* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
|
| 166 |
+
* You can reduce the default `--max-model-len` to allocate less memory for the pre-computed RoPE frequencies,
|
| 167 |
+
if you are certain that you won't have to transcribe for more than X hours. By default the model uses a `--max-model-len` of 131072 (> 3h).
|
| 168 |
|
| 169 |
#### Usage of the model
|
| 170 |
|