metadata
license: other
license_name: mistral-research-license
license_link: https://mistral.ai/licenses/MRL-0.1.md
base_model: tacodevs/Behemoth-X-R1-123B
base_model_relation: quantized
tags:
- mistral
- fp8
- w8a8
- compressed-tensors
- quantized
- thinking
- roleplay
- creative-writing
language:
- en
pipeline_tag: text-generation
Behemoth-X-R1-123B · FP8
Single-GPU beast mode.
About
FP8 dynamic quantization of tacodevs/Behemoth-X-R1-123B. Near-lossless quality, half the weight bytes, fits on a single H200.
- Method: W8A8 dynamic quantization via llm-compressor
- Format:
compressed-tensors - Size: ~115 GB
- Calibration: None needed (dynamic scheme)
Usage with vLLM
python -m vllm.entrypoints.openai.api_server \
--model tacodevs/Behemoth-X-R1-123B-FP8 \
--max-model-len 16384 \
--gpu-memory-utilization 0.95 \
--trust-remote-code
Fits on 1× H200 141GB with ~30k context window.
See the main model card
Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo:
👉 tacodevs/Behemoth-X-R1-123B
License
Inherited from base: Mistral Research License — non-commercial use only.