tacodevs's picture
Initial upload: FP8 dynamic quantization of Behemoth-X-R1-123B
a70ca0d verified
metadata
license: other
license_name: mistral-research-license
license_link: https://mistral.ai/licenses/MRL-0.1.md
base_model: tacodevs/Behemoth-X-R1-123B
base_model_relation: quantized
tags:
  - mistral
  - fp8
  - w8a8
  - compressed-tensors
  - quantized
  - thinking
  - roleplay
  - creative-writing
language:
  - en
pipeline_tag: text-generation
Behemoth-X-R1-123B

Behemoth-X-R1-123B · FP8

Single-GPU beast mode.

quant vram gpu

About

FP8 dynamic quantization of tacodevs/Behemoth-X-R1-123B. Near-lossless quality, half the weight bytes, fits on a single H200.

  • Method: W8A8 dynamic quantization via llm-compressor
  • Format: compressed-tensors
  • Size: ~115 GB
  • Calibration: None needed (dynamic scheme)

Usage with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model tacodevs/Behemoth-X-R1-123B-FP8 \
  --max-model-len 16384 \
  --gpu-memory-utilization 0.95 \
  --trust-remote-code

Fits on 1× H200 141GB with ~30k context window.

See the main model card

Full documentation, prompt format, prefill examples, credits, and everything else is on the source repo:

👉 tacodevs/Behemoth-X-R1-123B

License

Inherited from base: Mistral Research License — non-commercial use only.