Kosmic-35B-A3B-NVFP4
Prosoft์ ์ฐ์ ์ฉ AI ์ด์์คํดํธ Kosmic โ Qwen3.5-35B-A3B ๊ธฐ๋ฐ NVFP4 ์์ํ ๋ชจ๋ธ.
๋ชจ๋ธ ์ ๋ณด
| ํญ๋ชฉ | ๊ฐ |
|---|---|
| ๋ฒ ์ด์ค ๋ชจ๋ธ | Qwen/Qwen3.5-35B-A3B |
| ์ด ํ๋ผ๋ฏธํฐ | 35B (ํ์ฑ 3B, MoE 256 experts) |
| ์์ํ | NVFP4 (nvidia-modelopt, EXPERTS_ONLY) |
| ์์ํ ํฌ๋งท | quant_method: modelopt |
| ๋ชจ๋ธ ํฌ๊ธฐ | ~22 GB |
| ๋ผ์ด์ ์ค | Apache 2.0 |
์ฌ์ฉ ๋ฐฉ๋ฒ (vLLM)
vllm serve prosoft0405/Kosmic-35B-A3B-NVFP4 \
--trust-remote-code \
--language-model-only \
--gpu-memory-utilization 0.85 \
--reasoning-parser qwen3
์ฌ์ฉ ๋ฐฉ๋ฒ (Docker)
docker run -d --gpus all --ipc host -p 8000:8000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
vllm/vllm-openai:cu130-nightly \
prosoft0405/Kosmic-35B-A3B-NVFP4 \
--served-model-name kosmic-35b \
--language-model-only \
--gpu-memory-utilization 0.85 \
--reasoning-parser qwen3
ํ๋์จ์ด ์๊ตฌ์ฌํญ
- NVIDIA Blackwell GPU (DGX Spark ์ต์ ํ)
- vLLM 0.17.0+ (nightly ๊ถ์ฅ)
- transformers 5.2.0+
์์ํ ๋ฐฉ์
- nvidia-modelopt NVFP4_EXPERTS_ONLY_CFG
- MoE routed expert weights๋ง NVFP4, ๋๋จธ์ง BF16 ์ ์ง
- GDN linear_attn, self_attn, shared_expert, mlp.gate ๋ฑ BF16 ๋ณด์กด
- Downloads last month
- 97
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support