--- license: other license_name: mistral-research-license license_link: https://mistral.ai/licenses/MRL-0.1.md base_model: tacodevs/Behemoth-X-R1-123B base_model_relation: quantized tags: - mistral - fp8 - w8a8 - compressed-tensors - quantized - thinking - roleplay - creative-writing language: - en pipeline_tag: text-generation ---
Behemoth-X-R1-123B

Behemoth-X-R1-123B ยท FP8

Single-GPU beast mode.

quant vram gpu

## About FP8 dynamic quantization of [`tacodevs/Behemoth-X-R1-123B`](https://huggingface.co/tacodevs/Behemoth-X-R1-123B). Near-lossless quality, half the weight bytes, fits on a single H200. - **Method:** W8A8 dynamic quantization via [llm-compressor](https://github.com/vllm-project/llm-compressor) - **Format:** `compressed-tensors` - **Size:** ~115 GB - **Calibration:** None needed (dynamic scheme) ## Usage with vLLM ```bash python -m vllm.entrypoints.openai.api_server \ --model tacodevs/Behemoth-X-R1-123B-FP8 \ --max-model-len 16384 \ --gpu-memory-utilization 0.95 \ --trust-remote-code ``` Fits on **1ร— H200 141GB** with ~30k context window. ## See the main model card Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo: ### ๐Ÿ‘‰ [tacodevs/Behemoth-X-R1-123B](https://huggingface.co/tacodevs/Behemoth-X-R1-123B) ## License Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** โ€” non-commercial use only.