Behemoth-X-R1-123B

Behemoth-X-R1-123B Β· FP8

Single-GPU beast mode.

quant vram gpu

About

FP8 dynamic quantization of tacodevs/Behemoth-X-R1-123B. Near-lossless quality, half the weight bytes, fits on a single H200.

  • Method: W8A8 dynamic quantization via llm-compressor
  • Format: compressed-tensors
  • Size: ~115 GB
  • Calibration: None needed (dynamic scheme)

Usage with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model tacodevs/Behemoth-X-R1-123B-FP8 \
  --max-model-len 16384 \
  --gpu-memory-utilization 0.95 \
  --trust-remote-code

Fits on 1Γ— H200 141GB with ~30k context window.

See the main model card

Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo:

πŸ‘‰ tacodevs/Behemoth-X-R1-123B

License

Inherited from base: Mistral Research License β€” non-commercial use only.

Downloads last month
12
Safetensors
Model size
123B params
Tensor type
BF16
Β·
F8_E4M3
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tacodevs/Behemoth-X-R1-123B-FP8

Quantized
(3)
this model