RedHatAI
/

Qwen3-VL-32B-Instruct-NVFP4

Text Generation

compressed-tensors

8-bit precision

Model card Files Files and versions

krishnateja95 commited on Dec 10, 2025

Commit

409d48e

·

verified ·

1 Parent(s): bc7558e

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -18,8 +18,8 @@ base_model:
   - **Input:** Text, Image
   - **Output:** Text
 - **Model Optimizations:**
-  - **Weight quantization:** FP8
-  - **Activation quantization:** FP8
 - **Release Date:**
 - **Version:** 1.0
 - **Model Developers:**: Red Hat
@@ -29,7 +29,7 @@ Quantized version of [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qw
 ### Model Optimizations
 This model was obtained by quantizing the weights and activations of [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) to FP8 data type.
-This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.
 Only the weights and activations of the linear operators within transformers blocks of the language model are quantized.

   - **Input:** Text, Image
   - **Output:** Text
 - **Model Optimizations:**
+  - **Weight quantization:** FP4
+  - **Activation quantization:** FP4
 - **Release Date:**
 - **Version:** 1.0
 - **Model Developers:**: Red Hat
 ### Model Optimizations
 This model was obtained by quantizing the weights and activations of [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) to FP8 data type.
+This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
 Only the weights and activations of the linear operators within transformers blocks of the language model are quantized.