DataSnake commited on
Commit
c924e31
·
verified ·
1 Parent(s): 957b1e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -20,4 +20,11 @@ tags:
20
  Quantized NVFP4 weights of the [Muse-12B](https://huggingface.co/LatitudeGames/Muse-12B) model, for use with nVidia Blackwell GPUs.
21
 
22
  ## Quantization details
23
- Uses [four-over-six](https://arxiv.org/abs/2512.02010) with MSE selection
 
 
 
 
 
 
 
 
20
  Quantized NVFP4 weights of the [Muse-12B](https://huggingface.co/LatitudeGames/Muse-12B) model, for use with nVidia Blackwell GPUs.
21
 
22
  ## Quantization details
23
+ Uses [4-over-6](https://arxiv.org/abs/2512.02010) adaptive block scaling with MSE selection for the weights, done by using the `memoryless_mse` observer for llm-compressor with `maxshrink` and `grid` set to negative numbers.
24
+
25
+ ### A Brief Overview of 4-over-6
26
+ One of the main downsides of using FP4 is the extreme sparsity of large values. At a base level, NVFP4 works by dividing the model into sixteen-element blocks, then assigning FP8 scale factors to each block (as well as a single FP32 scale factor for the tensor as a whole) such that the largest absolute value in the block maps to ±6. For example, if a block has the values {10,-20,40,-60}, the scale factor would be set to 10 and the FP4 values would be {1,-2,4,-6}. The problem is that the FP4 format only allows for a very limited set of values. In particular, it can't represent any number between 4 and 6, so anything in the block that maps to 5 will be severely affected by rounding error. Changing the scale factors so that the maximum value for the block maps to ±4 reduces the maximum possible error introduced by this type of rounding, but it also increases the rounding error in very small values, so it's not a good idea to just quantize the entire model with the maximum value set to 4.
27
+
28
+ ![image/png](four-over-six.png)
29
+
30
+ ### My Implementation