Kosmic-35B-A3B-NVFP4

Prosoft์˜ ์‚ฐ์—…์šฉ AI ์–ด์‹œ์Šคํ„ดํŠธ Kosmic โ€” Qwen3.5-35B-A3B ๊ธฐ๋ฐ˜ NVFP4 ์–‘์žํ™” ๋ชจ๋ธ.

๋ชจ๋ธ ์ •๋ณด

ํ•ญ๋ชฉ ๊ฐ’
๋ฒ ์ด์Šค ๋ชจ๋ธ Qwen/Qwen3.5-35B-A3B
์ด ํŒŒ๋ผ๋ฏธํ„ฐ 35B (ํ™œ์„ฑ 3B, MoE 256 experts)
์–‘์žํ™” NVFP4 (nvidia-modelopt, EXPERTS_ONLY)
์–‘์žํ™” ํฌ๋งท quant_method: modelopt
๋ชจ๋ธ ํฌ๊ธฐ ~22 GB
๋ผ์ด์„ ์Šค Apache 2.0

์‚ฌ์šฉ ๋ฐฉ๋ฒ• (vLLM)

vllm serve prosoft0405/Kosmic-35B-A3B-NVFP4 \
  --trust-remote-code \
  --language-model-only \
  --gpu-memory-utilization 0.85 \
  --reasoning-parser qwen3

์‚ฌ์šฉ ๋ฐฉ๋ฒ• (Docker)

docker run -d --gpus all --ipc host -p 8000:8000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:cu130-nightly \
  prosoft0405/Kosmic-35B-A3B-NVFP4 \
  --served-model-name kosmic-35b \
  --language-model-only \
  --gpu-memory-utilization 0.85 \
  --reasoning-parser qwen3

ํ•˜๋“œ์›จ์–ด ์š”๊ตฌ์‚ฌํ•ญ

  • NVIDIA Blackwell GPU (DGX Spark ์ตœ์ ํ™”)
  • vLLM 0.17.0+ (nightly ๊ถŒ์žฅ)
  • transformers 5.2.0+

์–‘์žํ™” ๋ฐฉ์‹

  • nvidia-modelopt NVFP4_EXPERTS_ONLY_CFG
  • MoE routed expert weights๋งŒ NVFP4, ๋‚˜๋จธ์ง€ BF16 ์œ ์ง€
  • GDN linear_attn, self_attn, shared_expert, mlp.gate ๋“ฑ BF16 ๋ณด์กด
Downloads last month
97
Safetensors
Model size
19B params
Tensor type
BF16
ยท
F8_E4M3
ยท
U8
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for prosoft0405/Kosmic-35B-A3B-NVFP4

Finetuned
(74)
this model