TheHouseOfTheDude
/

Behemoth-R1-123B-v2_Compressed-Tensors

Text Generation

compressed-tensors

Model card Files Files and versions

phaedawg commited on Nov 1, 2025

Commit

201315b

·

verified ·

1 Parent(s): 70e4d82

Updated branches

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -29,15 +29,17 @@ language:
 > **Revisions & Branches**
 >
 > - **main** — *placeholder* landing branch. The canonical README lives here; model files may be minimal.
-> - **W4A16** — Symmetrical AWQ 4‑bit weights / 16‑bit activations builds and related assets are published under this revision. (Use this for Marlin Kernel with VLLM)
-> - **W4A16-ASYM** — AWQ 4‑bit weights / 16‑bit activations builds and related assets are published under this revision.
-> - **INT8-W8A16** — 8‑bit weights / 16‑bit activations builds (e.g., INT8) published under this revision.
 >
 > 🔗 **Quick links:**
 > [Browse `main`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/main) ·
 > [Browse `W4A16`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W4A16) ·
-> [Browse `W4A16-ASYM`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W4A16-ASYM) ·
-> [Browse `INT8-W8A16`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/INT8-W8A16)
 >
 > *This repository hosts multiple quantizations of the finetuned parent model for vLLM using the compressed-tensors runtime format.*

 > **Revisions & Branches**
 >
 > - **main** — *placeholder* landing branch. The canonical README lives here; model files may be minimal.
+> - **NVFP4** - 4-bit weights / 4-bit activations (but acts like 16-bit activations)
+> - **W4A16** — Symmetrical AWQ 4‑bit weights / 16‑bit activations builds and related assets are published under this revision.
+> - **W8A16** — Symmetrical AWQ 8‑bit weights / 16‑bit activations builds and related assets are published under this revision.
+> - **W8A8-FP8_BLOCK** — 8‑bit weights / 8‑bit activations, FP8 quality but BLOCK style, to use Cutlas on Blackwell SM12.0 (Needs latest VLLM)
 >
 > 🔗 **Quick links:**
 > [Browse `main`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/main) ·
+> [Browse `NFVP4`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/NVFP4) ·
 > [Browse `W4A16`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W4A16) ·
+> [Browse `W8A16`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W8A16) ·
+> [Browse `W8A8-FP8_BLOCK`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W8A8-FP8_BLOCK)
 >
 > *This repository hosts multiple quantizations of the finetuned parent model for vLLM using the compressed-tensors runtime format.*