DataSnake commited on
Commit
fa2b71b
·
verified ·
1 Parent(s): 4db321d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ One of the main downsides of using FP4 is the extreme sparsity of large values.
20
 
21
  ![image/png](four-over-six.png)
22
 
23
- However, while scaling to ±4 reduces worst-case rounding error for large values, it increases rounding error for smaller values, so simply scaling every block to ±4 would be a bad idea. The solution is to try scaling each block both ways, then keep whichever gives the lowest quantization MSE for that block. The `memoryless_mse` observer in llm-compressor is designed to work on a similar principle, multiplying the scale factors by different values of \\(p\\) and seeing how the quantization MSE is affected. While this is primarily intended for \\(p \le 1\\), deliberately introducing clipping for large values in exchange for more precision on smaller values, when used with NVFP4 it's mathematically equivalent to mapping the most extreme values in each block to \\(±6/p\\). Obviously, this can be used to implement Four Over Six by setting \\(p \in \{1,1.5\}\\). The key to doing this is the following code from `mse.py`:
24
  ```
25
  for i in range(int(maxshrink * grid)):
26
  p = 1 - i / grid
 
20
 
21
  ![image/png](four-over-six.png)
22
 
23
+ However, while scaling to ±4 reduces worst-case rounding error for large values, it increases rounding error for smaller values, so simply scaling every block to ±4 would be a bad idea. The solution is to try scaling each block both ways, then keep whichever gives the lowest quantization MSE for that block. The `memoryless_mse` observer in llm-compressor is designed to work on a similar principle, calculating scale factors as though the block were multiplied by different values of \\(p\\) and choosing the value that minimizes quantization MSE. While this is primarily intended for \\(p\le1\\), when used with NVFP4 it's mathematically equivalent to mapping the most extreme values in each block to \\(±6/p\\). Obviously, this can be used to implement Four Over Six by setting \\(p\in\{1,1.5\}\\). The key to doing this is the following code from `mse.py`:
24
  ```
25
  for i in range(int(maxshrink * grid)):
26
  p = 1 - i / grid