| --- |
| license: other |
| license_name: mistral-research-license |
| license_link: https://mistral.ai/licenses/MRL-0.1.md |
| base_model: tacodevs/Behemoth-X-R1-123B |
| base_model_relation: quantized |
| tags: |
| - mistral |
| - fp8 |
| - w8a8 |
| - compressed-tensors |
| - quantized |
| - thinking |
| - roleplay |
| - creative-writing |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| <div align="center"> |
| <img src="https://huggingface.co/tacodevs/Behemoth-X-R1-123B/resolve/main/assets/hero.png" alt="Behemoth-X-R1-123B" style="width:100%; max-width:960px; border-radius:16px; box-shadow:0 0 60px rgba(236,72,153,0.35), 0 0 100px rgba(139,92,246,0.25);"/> |
| </div> |
|
|
| <div align="center" style="margin-top:24px;"> |
|
|
| <h1 style="font-size:3em; font-weight:900; background:linear-gradient(90deg,#ec4899 0%,#a855f7 50%,#06b6d4 100%); -webkit-background-clip:text; -webkit-text-fill-color:transparent; background-clip:text; margin:0;">Behemoth-X-R1-123B · FP8</h1> |
|
|
| <p style="font-size:1.2em; color:#a855f7; font-style:italic;">Single-GPU beast mode.</p> |
|
|
| <p> |
| <img src="https://img.shields.io/badge/quant-FP8_Dynamic-8B5CF6?style=for-the-badge" alt="quant"/> |
| <img src="https://img.shields.io/badge/VRAM-~130GB-EC4899?style=for-the-badge" alt="vram"/> |
| <img src="https://img.shields.io/badge/runs_on-1x_H200-06B6D4?style=for-the-badge" alt="gpu"/> |
| </p> |
|
|
| </div> |
|
|
| ## About |
|
|
| FP8 dynamic quantization of [`tacodevs/Behemoth-X-R1-123B`](https://huggingface.co/tacodevs/Behemoth-X-R1-123B). Near-lossless quality, half the weight bytes, fits on a single H200. |
|
|
| - **Method:** W8A8 dynamic quantization via [llm-compressor](https://github.com/vllm-project/llm-compressor) |
| - **Format:** `compressed-tensors` |
| - **Size:** ~115 GB |
| - **Calibration:** None needed (dynamic scheme) |
|
|
| ## Usage with vLLM |
|
|
| ```bash |
| python -m vllm.entrypoints.openai.api_server \ |
| --model tacodevs/Behemoth-X-R1-123B-FP8 \ |
| --max-model-len 16384 \ |
| --gpu-memory-utilization 0.95 \ |
| --trust-remote-code |
| ``` |
|
|
| Fits on **1× H200 141GB** with ~30k context window. |
|
|
| ## See the main model card |
|
|
| Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo: |
|
|
| ### 👉 [tacodevs/Behemoth-X-R1-123B](https://huggingface.co/tacodevs/Behemoth-X-R1-123B) |
|
|
| ## License |
|
|
| Inherited from base: **[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)** — non-commercial use only. |
|
|