Add metadata and link to paper

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +28 -2
README.md CHANGED
@@ -1,8 +1,34 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - mistralai/Mistral-Nemo-Instruct-2407
 
5
  tags:
6
  - nvfp4
 
 
7
  ---
8
- Quantized NVFP4 version of [Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) with [Four Over Six](https://arxiv.org/abs/2512.02010) adaptive block scaling, created to compare against my [hybrid quant]([https://huggingface.co/DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8). Made with the same version of llm-compressor and compressed-tensors, using the same calibration data, to isolate the variables as much as possible.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - mistralai/Mistral-Nemo-Instruct-2407
4
+ license: apache-2.0
5
  tags:
6
  - nvfp4
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
  ---
10
+
11
+ Quantized NVFP4 version of [Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) with [Four Over Six](https://arxiv.org/abs/2512.02010) adaptive block scaling, created to compare against my [hybrid quant](https://huggingface.co/DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8). Made with the same version of llm-compressor and compressed-tensors, using the same calibration data, to isolate the variables as much as possible.
12
+
13
+ ## Resources
14
+
15
+ - **Paper:** [Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling](https://huggingface.co/papers/2512.02010)
16
+ - **Repository:** [GitHub - mit-han-lab/fouroversix](https://github.com/mit-han-lab/fouroversix)
17
+
18
+ ## Method
19
+
20
+ Four Over Six (4/6) is a modification to the block-scaled NVFP4 quantization algorithm that yields reduced quantization error. Unlike integer formats, floating point formats have non-uniform step sizes which create larger quantization error on larger values. 4/6 takes advantage of this by adaptively scaling some blocks to smaller FP4 values, making the distribution of representable values more uniform and reducing quantization error for near-maximal values.
21
+
22
+ ## Citation
23
+
24
+ ```bibtex
25
+ @misc{cook2025sixaccuratenvfp4quantization,
26
+ title={Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling},
27
+ author={Jack Cook and Junxian Guo and Guangxuan Xiao and Yujun Lin and Song Han},
28
+ year={2025},
29
+ eprint={2512.02010},
30
+ archivePrefix={arXiv},
31
+ primaryClass={cs.CL},
32
+ url={https://arxiv.org/abs/2512.02010},
33
+ }
34
+ ```