Update README.md
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ library_name: vllm
|
|
| 20 |
|
| 21 |
This is an FP8 dynamically quantized (W8A8) version of `OpenGVLab/InternVL3_5-38B`optimized for high-performance inference with *vLLM*.
|
| 22 |
|
| 23 |
-
The quantization process uses a specialized recipe that preserves the model's core visual understanding capabilities while reducing the memory footprint by nearly
|
| 24 |
|
| 25 |
## Just Run It (vLLM serve)
|
| 26 |
|
|
|
|
| 20 |
|
| 21 |
This is an FP8 dynamically quantized (W8A8) version of `OpenGVLab/InternVL3_5-38B`optimized for high-performance inference with *vLLM*.
|
| 22 |
|
| 23 |
+
The quantization process uses a specialized recipe that preserves the model's core visual understanding capabilities while reducing the memory footprint by nearly 40%.
|
| 24 |
|
| 25 |
## Just Run It (vLLM serve)
|
| 26 |
|