Initial upload: FP8 dynamic quantization of Behemoth-X-R1-123B

a70ca0d verified 6 days ago

2.35 kB

	---
	license: other
	license_name: mistral-research-license
	license_link: https://mistral.ai/licenses/MRL-0.1.md
	base_model: tacodevs/Behemoth-X-R1-123B
	base_model_relation: quantized
	tags:
	- mistral
	- fp8
	- w8a8
	- compressed-tensors
	- quantized
	- thinking
	- roleplay
	- creative-writing
	language:
	- en
	pipeline_tag: text-generation
	---

	<div align="center">
	<img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/hero.png" alt="Behemoth-X-R1-123B" style="width:100%; max-width:960px; border-radius:16px; box-shadow:0 0 60px rgba(236,72,153,0.35), 0 0 100px rgba(139,92,246,0.25);"/>
	</div>

	<div align="center" style="margin-top:24px;">

	<h1 style="font-size:3em; font-weight:900; background:linear-gradient(90deg,#ec4899 0%,#a855f7 50%,#06b6d4 100%); -webkit-background-clip:text; -webkit-text-fill-color:transparent; background-clip:text; margin:0;">Behemoth-X-R1-123B · FP8</h1>

	<p style="font-size:1.2em; color:#a855f7; font-style:italic;">Single-GPU beast mode.</p>

	<p>
	<img src="https://img.shields.io/badge/quant-FP8_Dynamic-8B5CF6?style=for-the-badge" alt="quant"/>
	<img src="https://img.shields.io/badge/VRAM-~130GB-EC4899?style=for-the-badge" alt="vram"/>
	<img src="https://img.shields.io/badge/runs_on-1x_H200-06B6D4?style=for-the-badge" alt="gpu"/>
	</p>

	</div>

	## About

	FP8 dynamic quantization of [`tacodevs/Behemoth-X-R1-123B`](https://huggingface.co/tacodevs/Behemoth-X-R1-123B). Near-lossless quality, half the weight bytes, fits on a single H200.

	- Method: W8A8 dynamic quantization via [llm-compressor](https://github.com/vllm-project/llm-compressor)
	- Format: `compressed-tensors`
	- Size: ~115 GB
	- Calibration: None needed (dynamic scheme)

	## Usage with vLLM

	```bash
	python -m vllm.entrypoints.openai.api_server \
	--model tacodevs/Behemoth-X-R1-123B-FP8 \
	--max-model-len 16384 \
	--gpu-memory-utilization 0.95 \
	--trust-remote-code
	```

	Fits on 1× H200 141GB with ~30k context window.

	## See the main model card

	Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo:

	### 👉 [tacodevs/Behemoth-X-R1-123B](https://huggingface.co/tacodevs/Behemoth-X-R1-123B)

	## License

	Inherited from base: [Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md) — non-commercial use only.