Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ tags:
|
|
| 12 |
- FP8
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# Qwen3-32B-FP8-
|
| 16 |
|
| 17 |
## Model Overview
|
| 18 |
- **Model Architecture:** Qwen3ForCausalLM
|
|
@@ -30,7 +30,7 @@ tags:
|
|
| 30 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
|
| 31 |
- **Release Date:** 05/02/2025
|
| 32 |
- **Version:** 1.0
|
| 33 |
-
- **Model Developers:**
|
| 34 |
|
| 35 |
### Model Optimizations
|
| 36 |
|
|
@@ -51,7 +51,7 @@ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/
|
|
| 51 |
from vllm import LLM, SamplingParams
|
| 52 |
from transformers import AutoTokenizer
|
| 53 |
|
| 54 |
-
model_id = "
|
| 55 |
number_gpus = 1
|
| 56 |
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=20, min_p=0, max_tokens=256)
|
| 57 |
|
|
@@ -128,7 +128,7 @@ The model was evaluated on the OpenLLM leaderboard tasks (version 1), using [lm-
|
|
| 128 |
```
|
| 129 |
lm_eval \
|
| 130 |
--model vllm \
|
| 131 |
-
--model_args pretrained="
|
| 132 |
--tasks openllm \
|
| 133 |
--apply_chat_template\
|
| 134 |
--fewshot_as_multiturn \
|
|
|
|
| 12 |
- FP8
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# Qwen3-32B-FP8-Dynamic
|
| 16 |
|
| 17 |
## Model Overview
|
| 18 |
- **Model Architecture:** Qwen3ForCausalLM
|
|
|
|
| 30 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
|
| 31 |
- **Release Date:** 05/02/2025
|
| 32 |
- **Version:** 1.0
|
| 33 |
+
- **Model Developers:** BC Card, Redhat
|
| 34 |
|
| 35 |
### Model Optimizations
|
| 36 |
|
|
|
|
| 51 |
from vllm import LLM, SamplingParams
|
| 52 |
from transformers import AutoTokenizer
|
| 53 |
|
| 54 |
+
model_id = "BCCard/Qwen3-32B-FP8-dynamic"
|
| 55 |
number_gpus = 1
|
| 56 |
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=20, min_p=0, max_tokens=256)
|
| 57 |
|
|
|
|
| 128 |
```
|
| 129 |
lm_eval \
|
| 130 |
--model vllm \
|
| 131 |
+
--model_args pretrained="BCCard/Qwen3-32B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=8192,enable_chunk_prefill=True,tensor_parallel_size=1 \
|
| 132 |
--tasks openllm \
|
| 133 |
--apply_chat_template\
|
| 134 |
--fewshot_as_multiturn \
|