Kosmic-35B-A3B-FP8

Prosoft์˜ ์‚ฐ์—…์šฉ AI ์–ด์‹œ์Šคํ„ดํŠธ Kosmic โ€” Qwen3.5-35B-A3B ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฐ์—…์šฉ์œผ๋กœ ํŒŒ์ธํŠœ๋‹ ํ›„ FP8 ์–‘์žํ™” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

๋ชจ๋ธ ์ •๋ณด

ํ•ญ๋ชฉ ๊ฐ’
๋ฒ ์ด์Šค ๋ชจ๋ธ Qwen/Qwen3.5-35B-A3B
์ด ํŒŒ๋ผ๋ฏธํ„ฐ 35B (ํ™œ์„ฑ 3B, MoE 256 experts)
์–‘์žํ™” FP8 E4M3, block_size [128, 128]
์–‘์žํ™” ํฌ๋งท Qwen ๊ณต์‹ FP8๊ณผ ๋™์ผ (quant_method: fp8)
๋ชจ๋ธ ํฌ๊ธฐ ~33 GB
๋ผ์ด์„ ์Šค Apache 2.0

์‚ฌ์šฉ ๋ฐฉ๋ฒ• (vLLM)

vllm serve prosoft0405/Kosmic-35B-A3B-FP8 \
  --trust-remote-code \
  --language-model-only \
  --gpu-memory-utilization 0.85 \
  --reasoning-parser qwen3

์‚ฌ์šฉ ๋ฐฉ๋ฒ• (Docker)

docker run -d --gpus all --ipc host -p 8000:8000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:cu130-nightly \
  prosoft0405/Kosmic-35B-A3B-FP8 \
  --served-model-name kosmic-35b \
  --language-model-only \
  --gpu-memory-utilization 0.85 \
  --reasoning-parser qwen3

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="kosmic-35b",
    messages=[
        {"role": "system", "content": "๋‹น์‹ ์€ Prosoft์—์„œ ๊ฐœ๋ฐœํ•œ ์‚ฐ์—…์šฉ AI ์–ด์‹œ์Šคํ„ดํŠธ Kosmic์ž…๋‹ˆ๋‹ค."},
        {"role": "user", "content": "๋„ˆ ๋ˆ„๊ตฌ์•ผ?"},
    ],
    max_tokens=256,
)
print(response.choices[0].message.content)

ํ•˜๋“œ์›จ์–ด ์š”๊ตฌ์‚ฌํ•ญ

  • NVIDIA GPU 40GB+ VRAM (RTX 4090, A100, H100, RTX PRO 6000 ๋“ฑ)
  • DGX Spark (128GB ํ†ตํ•ฉ๋ฉ”๋ชจ๋ฆฌ) ์ง€์›
  • vLLM 0.17.0+ (nightly ๊ถŒ์žฅ)
  • transformers 5.2.0+

์–‘์žํ™” ๋ฐฉ์‹

Qwen/Qwen3.5-35B-A3B-FP8 ๊ณต์‹ ๋ชจ๋ธ๊ณผ ๋™์ผํ•œ ๋„ค์ดํ‹ฐ๋ธŒ FP8 ์–‘์žํ™”:

  • FP8 E4M3 weight quantization with block-wise scaling (128x128)
  • weight_scale_inv per block
  • GDN linear_attn, self_attn, shared_expert, mlp.gate ๋“ฑ์€ BF16 ์œ ์ง€
  • MoE routed expert weights๋งŒ FP8 ์–‘์žํ™”
Downloads last month
49
Safetensors
Model size
35B params
Tensor type
F32
ยท
BF16
ยท
F8_E4M3
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for prosoft0405/Kosmic-35B-A3B-FP8

Quantized
(209)
this model