Update README.md
Browse files
README.md
CHANGED
|
@@ -45,7 +45,7 @@ This NVFP4 quantized version reduces memory requirements significantly:
|
|
| 45 |
|
| 46 |
Size: ~22 GB (down from ~67 GB)
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
Fully compatible with vLLM (including streaming text output)
|
| 51 |
|
|
@@ -82,7 +82,7 @@ docker run --rm -ti --gpus all \
|
|
| 82 |
--trust-remote-code
|
| 83 |
```
|
| 84 |
|
| 85 |
-
This example script should allow an audio wave
|
| 86 |
```py
|
| 87 |
import requests
|
| 88 |
import base64
|
|
|
|
| 45 |
|
| 46 |
Size: ~22 GB (down from ~67 GB)
|
| 47 |
|
| 48 |
+
Should fit comfortably on a single RTX 5090
|
| 49 |
|
| 50 |
Fully compatible with vLLM (including streaming text output)
|
| 51 |
|
|
|
|
| 82 |
--trust-remote-code
|
| 83 |
```
|
| 84 |
|
| 85 |
+
This example script should allow an audio wave file to be streamed to the model and get a response based on the prompt.
|
| 86 |
```py
|
| 87 |
import requests
|
| 88 |
import base64
|