BCCard
/

Qwen2.5-VL-32B-Instruct-FP8-Dynamic

text-generation-inference

compressed-tensors

Model card Files Files and versions

sh2orc commited on Jun 20, 2025

Commit

a3dcd69

·

verified ·

1 Parent(s): 9a0e866

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -8,22 +8,22 @@ license_link: >-
   https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md
 language:
 - en
-base_model: Qwen/Qwen2.5-VL-72B-Instruct
 library_name: transformers
 ---
 # Qwen2.5-VL-32B-Instruct-FP8-Dynamic
 ## Model Overview
-- **Model Architecture:** Qwen2.5-VL-72B-Instruct
   - **Input:** Vision-Text
   - **Output:** Text
 - **Model Optimizations:**
   - **Weight quantization:** FP8
   - **Activation quantization:** FP8
-- **Release Date:** 2/24/2025
 - **Version:** 1.0
-- **Model Developers:** Neural Magic
 Quantized version of [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct).
@@ -43,7 +43,7 @@ from vllm import LLM, SamplingParams
 # prepare model
 llm = LLM(
-    model="BCCard/Qwen2.5-VL-72B-Instruct-FP8-Dynamic",
     trust_remote_code=True,
     max_model_len=4096,
     max_num_seqs=2,

   https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md
 language:
 - en
+base_model: Qwen/Qwen2.5-VL-32B-Instruct
 library_name: transformers
 ---
 # Qwen2.5-VL-32B-Instruct-FP8-Dynamic
 ## Model Overview
+- **Model Architecture:** Qwen2.5-VL-32B-Instruct
   - **Input:** Vision-Text
   - **Output:** Text
 - **Model Optimizations:**
   - **Weight quantization:** FP8
   - **Activation quantization:** FP8
+- **Release Date:** 5/3/2025
 - **Version:** 1.0
+- **Model Developers:** BC Card
 Quantized version of [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct).
 # prepare model
 llm = LLM(
+    model="BCCard/Qwen2.5-VL-32B-Instruct-FP8-Dynamic",
     trust_remote_code=True,
     max_model_len=4096,
     max_num_seqs=2,