Kosmic-35B-A3B-NVFP4

Prosoft의 산업용 AI 어시스턴트 Kosmic — Qwen3.5-35B-A3B 기반 NVFP4 양자화 모델.

모델 정보

항목	값
베이스 모델	Qwen/Qwen3.5-35B-A3B
총 파라미터	35B (활성 3B, MoE 256 experts)
양자화	NVFP4 (nvidia-modelopt, EXPERTS_ONLY)
양자화 포맷	quant_method: modelopt
모델 크기	~22 GB
라이선스	Apache 2.0

사용 방법 (vLLM)

vllm serve prosoft0405/Kosmic-35B-A3B-NVFP4 \
  --trust-remote-code \
  --language-model-only \
  --gpu-memory-utilization 0.85 \
  --reasoning-parser qwen3

사용 방법 (Docker)

docker run -d --gpus all --ipc host -p 8000:8000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:cu130-nightly \
  prosoft0405/Kosmic-35B-A3B-NVFP4 \
  --served-model-name kosmic-35b \
  --language-model-only \
  --gpu-memory-utilization 0.85 \
  --reasoning-parser qwen3

하드웨어 요구사항

NVIDIA Blackwell GPU (DGX Spark 최적화)
vLLM 0.17.0+ (nightly 권장)
transformers 5.2.0+

양자화 방식

nvidia-modelopt NVFP4_EXPERTS_ONLY_CFG
MoE routed expert weights만 NVFP4, 나머지 BF16 유지
GDN linear_attn, self_attn, shared_expert, mlp.gate 등 BF16 보존

Downloads last month: 97

Safetensors

Model size

19B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prosoft0405/Kosmic-35B-A3B-NVFP4

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Finetuned

(74)

this model