---
license: other
license_name: mistral-research-license
license_link: https://mistral.ai/licenses/MRL-0.1.md
base_model: tacodevs/Behemoth-X-R1-123B
base_model_relation: quantized
tags:
- mistral
- fp8
- w8a8
- compressed-tensors
- quantized
- thinking
- roleplay
- creative-writing
language:
- en
pipeline_tag: text-generation
---
Behemoth-X-R1-123B ยท FP8
Single-GPU beast mode.
## About
FP8 dynamic quantization of [`tacodevs/Behemoth-X-R1-123B`](https://huggingface.co/tacodevs/Behemoth-X-R1-123B). Near-lossless quality, half the weight bytes, fits on a single H200.
- **Method:** W8A8 dynamic quantization via [llm-compressor](https://github.com/vllm-project/llm-compressor)
- **Format:** `compressed-tensors`
- **Size:** ~115 GB
- **Calibration:** None needed (dynamic scheme)
## Usage with vLLM
```bash
python -m vllm.entrypoints.openai.api_server \
--model tacodevs/Behemoth-X-R1-123B-FP8 \
--max-model-len 16384 \
--gpu-memory-utilization 0.95 \
--trust-remote-code
```
Fits on **1ร H200 141GB** with ~30k context window.
## See the main model card
Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo:
### ๐ [tacodevs/Behemoth-X-R1-123B](https://huggingface.co/tacodevs/Behemoth-X-R1-123B)
## License
Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** โ non-commercial use only.