Behemoth-X-R1-123B · FP8

Single-GPU beast mode.

About

FP8 dynamic quantization of tacodevs/Behemoth-X-R1-123B. Near-lossless quality, half the weight bytes, fits on a single H200.

Method: W8A8 dynamic quantization via llm-compressor
Format: compressed-tensors
Size: ~115 GB
Calibration: None needed (dynamic scheme)

Usage with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model tacodevs/Behemoth-X-R1-123B-FP8 \
  --max-model-len 16384 \
  --gpu-memory-utilization 0.95 \
  --trust-remote-code

Fits on 1× H200 141GB with ~30k context window.

See the main model card

Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo:

👉 tacodevs/Behemoth-X-R1-123B

License

Inherited from base: Mistral Research License — non-commercial use only.

Downloads last month: 101

Safetensors

Model size

123B params

Tensor type

BF16

F8_E4M3

Model tree for tacodevs/Behemoth-X-R1-123B-FP8

Base model

tacodevs/Behemoth-X-R1-123B

Quantized

(3)

this model