EAGLE Head and Performance
#2
by HristoTodorov - opened
I’m wondering how is the performance of pruned versions compared to say AWQ quant speaking of pure RAM optimisation to fit hardware. One see two potential benefits:
- Preseved EAGLE head for speculative decoding
- Native FP8 on Hopper
Can you confirm if your method preserves the EAGLE head for performance benefits?