BCCard
/

Qwen3-32B-FP8-Dynamic

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

sh2orc commited on Jun 20, 2025

Commit

2c48ccf

·

verified ·

1 Parent(s): 7e85a54

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ tags:
 - FP8
 ---
-# Qwen3-32B-FP8-dynamic
 ## Model Overview
 - **Model Architecture:** Qwen3ForCausalLM
@@ -30,7 +30,7 @@ tags:
 - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
 - **Release Date:** 05/02/2025
 - **Version:** 1.0
-- **Model Developers:** RedHat (Neural Magic)
 ### Model Optimizations
@@ -51,7 +51,7 @@ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/
 from vllm import LLM, SamplingParams
 from transformers import AutoTokenizer
-model_id = "RedHatAI/Qwen3-32B-FP8-dynamic"
 number_gpus = 1
 sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=20, min_p=0, max_tokens=256)
@@ -128,7 +128,7 @@ The model was evaluated on the OpenLLM leaderboard tasks (version 1), using [lm-
   ```
   lm_eval \
     --model vllm \
-    --model_args pretrained="RedHatAI/Qwen3-32B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=8192,enable_chunk_prefill=True,tensor_parallel_size=1 \
     --tasks openllm \
     --apply_chat_template\
     --fewshot_as_multiturn \

 - FP8
 ---
+# Qwen3-32B-FP8-Dynamic
 ## Model Overview
 - **Model Architecture:** Qwen3ForCausalLM
 - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
 - **Release Date:** 05/02/2025
 - **Version:** 1.0
+- **Model Developers:** BC Card, Redhat
 ### Model Optimizations
 from vllm import LLM, SamplingParams
 from transformers import AutoTokenizer
+model_id = "BCCard/Qwen3-32B-FP8-dynamic"
 number_gpus = 1
 sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=20, min_p=0, max_tokens=256)
   ```
   lm_eval \
     --model vllm \
+    --model_args pretrained="BCCard/Qwen3-32B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=8192,enable_chunk_prefill=True,tensor_parallel_size=1 \
     --tasks openllm \
     --apply_chat_template\
     --fewshot_as_multiturn \