phaedawg commited on
Commit
201315b
·
verified ·
1 Parent(s): 70e4d82

Updated branches

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -29,15 +29,17 @@ language:
29
  > **Revisions & Branches**
30
  >
31
  > - **main** — *placeholder* landing branch. The canonical README lives here; model files may be minimal.
32
- > - **W4A16** Symmetrical AWQ 4bit weights / 16‑bit activations builds and related assets are published under this revision. (Use this for Marlin Kernel with VLLM)
33
- > - **W4A16-ASYM** — AWQ 4‑bit weights / 16‑bit activations builds and related assets are published under this revision.
34
- > - **INT8-W8A16** — 8‑bit weights / 16‑bit activations builds (e.g., INT8) published under this revision.
 
35
  >
36
  > 🔗 **Quick links:**
37
  > [Browse `main`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/main) ·
 
38
  > [Browse `W4A16`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W4A16) ·
39
- > [Browse `W4A16-ASYM`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W4A16-ASYM) ·
40
- > [Browse `INT8-W8A16`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/INT8-W8A16)
41
  >
42
  > *This repository hosts multiple quantizations of the finetuned parent model for vLLM using the compressed-tensors runtime format.*
43
 
 
29
  > **Revisions & Branches**
30
  >
31
  > - **main** — *placeholder* landing branch. The canonical README lives here; model files may be minimal.
32
+ > - **NVFP4** - 4-bit weights / 4-bit activations (but acts like 16-bit activations)
33
+ > - **W4A16** — Symmetrical AWQ 4‑bit weights / 16‑bit activations builds and related assets are published under this revision.
34
+ > - **W8A16** — Symmetrical AWQ 8‑bit weights / 16‑bit activations builds and related assets are published under this revision.
35
+ > - **W8A8-FP8_BLOCK** — 8‑bit weights / 8‑bit activations, FP8 quality but BLOCK style, to use Cutlas on Blackwell SM12.0 (Needs latest VLLM)
36
  >
37
  > 🔗 **Quick links:**
38
  > [Browse `main`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/main) ·
39
+ > [Browse `NFVP4`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/NVFP4) ·
40
  > [Browse `W4A16`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W4A16) ·
41
+ > [Browse `W8A16`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W8A16) ·
42
+ > [Browse `W8A8-FP8_BLOCK`](https://huggingface.co/TheHouseOfTheDude/Behemoth-R1-123B-v2_Compressed-Tensors/tree/W8A8-FP8_BLOCK)
43
  >
44
  > *This repository hosts multiple quantizations of the finetuned parent model for vLLM using the compressed-tensors runtime format.*
45