File size: 2,345 Bytes
a70ca0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: other
license_name: mistral-research-license
license_link: https://mistral.ai/licenses/MRL-0.1.md
base_model: tacodevs/Behemoth-X-R1-123B
base_model_relation: quantized
tags:
  - mistral
  - fp8
  - w8a8
  - compressed-tensors
  - quantized
  - thinking
  - roleplay
  - creative-writing
language:
  - en
pipeline_tag: text-generation
---

<div align="center">
<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/hero.png" alt="Behemoth-X-R1-123B" style="width:100%; max-width:960px; border-radius:16px; box-shadow:0 0 60px rgba(236,72,153,0.35), 0 0 100px rgba(139,92,246,0.25);"/>
</div>

<div align="center" style="margin-top:24px;">

<h1 style="font-size:3em; font-weight:900; background:linear-gradient(90deg,#ec4899 0%,#a855f7 50%,#06b6d4 100%); -webkit-background-clip:text; -webkit-text-fill-color:transparent; background-clip:text; margin:0;">Behemoth-X-R1-123B · FP8</h1>

<p style="font-size:1.2em; color:#a855f7; font-style:italic;">Single-GPU beast mode.</p>

<p>
<img src="https://img.shields.io/badge/quant-FP8_Dynamic-8B5CF6?style=for-the-badge" alt="quant"/>
<img src="https://img.shields.io/badge/VRAM-~130GB-EC4899?style=for-the-badge" alt="vram"/>
<img src="https://img.shields.io/badge/runs_on-1x_H200-06B6D4?style=for-the-badge" alt="gpu"/>
</p>

</div>

## About

FP8 dynamic quantization of [`tacodevs/Behemoth-X-R1-123B`](https://huggingface.co/tacodevs/Behemoth-X-R1-123B). Near-lossless quality, half the weight bytes, fits on a single H200.

- **Method:** W8A8 dynamic quantization via [llm-compressor](https://github.com/vllm-project/llm-compressor)
- **Format:** `compressed-tensors`
- **Size:** ~115 GB
- **Calibration:** None needed (dynamic scheme)

## Usage with vLLM

```bash
python -m vllm.entrypoints.openai.api_server \
  --model tacodevs/Behemoth-X-R1-123B-FP8 \
  --max-model-len 16384 \
  --gpu-memory-utilization 0.95 \
  --trust-remote-code
```

Fits on **1× H200 141GB** with ~30k context window.

## See the main model card

Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo:

### 👉 [tacodevs/Behemoth-X-R1-123B](https://huggingface.co/tacodevs/Behemoth-X-R1-123B)

## License

Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** — non-commercial use only.