JGOS-31B-FP8

FP8 (compressed-tensors W8A8) quantized JGOS-31B Korean reasoning LLM (Gemma4, 31B). Native step-by-step thinking.

  • Weights: ~33GB FP8 (from 59GB bf16) — fits a single eval GPU (>= ~40GB), near-lossless (ref: JGOS-398B bf16 83/74 -> FP8 82/74).
  • Quant: compressed-tensors FP8_DYNAMIC, language Linear only (vision/lm_head excluded).
  • Context: max_position_embeddings 256K; serve with --max-model-len 16384 for eval.
  • Thinking: enabled by default via chat_template.

Docker (K-AI Evaluation, model baked-in, no internet needed)

Image: vidraft/jgos-31b-fp8:01.00 (vLLM 0.22.0, port 8000, OpenAI-compatible) served-model-name JGOS-31B-FP8, max-model-len 16384, thinking + 8192 min generation forced via proxy.

License

Gemma license (inherited from base).

Downloads last month
6
Safetensors
Model size
31B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support