DataSnake
/

Muse-12B-NVFP4-4over6

Text Generation

8-bit precision

compressed-tensors

Model card Files Files and versions

DataSnake commited on Feb 23

Commit

c924e31

·

verified ·

1 Parent(s): 957b1e8

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -20,4 +20,11 @@ tags:
 Quantized NVFP4 weights of the [Muse-12B](https://huggingface.co/LatitudeGames/Muse-12B) model, for use with nVidia Blackwell GPUs.
 ## Quantization details
-Uses [four-over-six](https://arxiv.org/abs/2512.02010) with MSE selection

 Quantized NVFP4 weights of the [Muse-12B](https://huggingface.co/LatitudeGames/Muse-12B) model, for use with nVidia Blackwell GPUs.
 ## Quantization details
+Uses [4-over-6](https://arxiv.org/abs/2512.02010) adaptive block scaling with MSE selection for the weights, done by using the `memoryless_mse` observer for llm-compressor with `maxshrink` and `grid` set to negative numbers.
+### A Brief Overview of 4-over-6
+One of the main downsides of using FP4 is the extreme sparsity of large values. At a base level, NVFP4 works by dividing the model into sixteen-element blocks, then assigning FP8 scale factors to each block (as well as a single FP32 scale factor for the tensor as a whole) such that the largest absolute value in the block maps to ±6. For example, if a block has the values {10,-20,40,-60}, the scale factor would be set to 10 and the FP4 values would be {1,-2,4,-6}. The problem is that the FP4 format only allows for a very limited set of values. In particular, it can't represent any number between 4 and 6, so anything in the block that maps to 5 will be severely affected by rounding error. Changing the scale factors so that the maximum value for the block maps to ±4 reduces the maximum possible error introduced by this type of rounding, but it also increases the rounding error in very small values, so it's not a good idea to just quantize the entire model with the maximum value set to 4.
+![image/png](four-over-six.png)
+### My Implementation