File size: 2,345 Bytes
a70ca0d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | ---
license: other
license_name: mistral-research-license
license_link: https://mistral.ai/licenses/MRL-0.1.md
base_model: tacodevs/Behemoth-X-R1-123B
base_model_relation: quantized
tags:
- mistral
- fp8
- w8a8
- compressed-tensors
- quantized
- thinking
- roleplay
- creative-writing
language:
- en
pipeline_tag: text-generation
---
<div align="center">
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/hero.png" alt="Behemoth-X-R1-123B" style="width:100%; max-width:960px; border-radius:16px; box-shadow:0 0 60px rgba(236,72,153,0.35), 0 0 100px rgba(139,92,246,0.25);"/>
</div>
<div align="center" style="margin-top:24px;">
<h1 style="font-size:3em; font-weight:900; background:linear-gradient(90deg,#ec4899 0%,#a855f7 50%,#06b6d4 100%); -webkit-background-clip:text; -webkit-text-fill-color:transparent; background-clip:text; margin:0;">Behemoth-X-R1-123B · FP8</h1>
<p style="font-size:1.2em; color:#a855f7; font-style:italic;">Single-GPU beast mode.</p>
<p>
<img src="https://img.shields.io/badge/quant-FP8_Dynamic-8B5CF6?style=for-the-badge" alt="quant"/>
<img src="https://img.shields.io/badge/VRAM-~130GB-EC4899?style=for-the-badge" alt="vram"/>
<img src="https://img.shields.io/badge/runs_on-1x_H200-06B6D4?style=for-the-badge" alt="gpu"/>
</p>
</div>
## About
FP8 dynamic quantization of [`tacodevs/Behemoth-X-R1-123B`](https://huggingface.co/tacodevs/Behemoth-X-R1-123B). Near-lossless quality, half the weight bytes, fits on a single H200.
- **Method:** W8A8 dynamic quantization via [llm-compressor](https://github.com/vllm-project/llm-compressor)
- **Format:** `compressed-tensors`
- **Size:** ~115 GB
- **Calibration:** None needed (dynamic scheme)
## Usage with vLLM
```bash
python -m vllm.entrypoints.openai.api_server \
--model tacodevs/Behemoth-X-R1-123B-FP8 \
--max-model-len 16384 \
--gpu-memory-utilization 0.95 \
--trust-remote-code
```
Fits on **1× H200 141GB** with ~30k context window.
## See the main model card
Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo:
### 👉 [tacodevs/Behemoth-X-R1-123B](https://huggingface.co/tacodevs/Behemoth-X-R1-123B)
## License
Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** — non-commercial use only.
|