Behemoth-X-R1-123B Β· FP8
Single-GPU beast mode.
About
FP8 dynamic quantization of tacodevs/Behemoth-X-R1-123B. Near-lossless quality, half the weight bytes, fits on a single H200.
- Method: W8A8 dynamic quantization via llm-compressor
- Format:
compressed-tensors - Size: ~115 GB
- Calibration: None needed (dynamic scheme)
Usage with vLLM
python -m vllm.entrypoints.openai.api_server \
--model tacodevs/Behemoth-X-R1-123B-FP8 \
--max-model-len 16384 \
--gpu-memory-utilization 0.95 \
--trust-remote-code
Fits on 1Γ H200 141GB with ~30k context window.
See the main model card
Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo:
π tacodevs/Behemoth-X-R1-123B
License
Inherited from base: Mistral Research License β non-commercial use only.
- Downloads last month
- 12
Model tree for tacodevs/Behemoth-X-R1-123B-FP8
Base model
tacodevs/Behemoth-X-R1-123B