We look forward to a perfect AWQ or GPTQ quantized version.

#2
by su400 - opened
This comment has been hidden

We look forward to a perfect AWQ or GPTQ quantized version. Considering the enhanced programming and mathematical capabilities of the new R1 version, traditional quantization methods may need improvement to preserve the specialized knowledge related to mathematics and programming as much as possible, avoiding compression by quantization. On GPTQ quantized versions released by other organizations, a noticeably higher error rate compared to the official version has been observed during longer programming tasks; this degradation has reached a noticeable level. To maintain programming and mathematical capabilities, a slightly larger memory footprint is acceptable. Taking a single H20 with 768G of VRAM as the baseline, maintaining a 65,535-token context length under such hardware conditions would be excellent.

Sign up or log in to comment