EAGLE Head and Performance

#2
by HristoTodorov - opened

I’m wondering how is the performance of pruned versions compared to say AWQ quant speaking of pure RAM optimisation to fit hardware. One see two potential benefits:

  • Preseved EAGLE head for speculative decoding
  • Native FP8 on Hopper

Can you confirm if your method preserves the EAGLE head for performance benefits?

Sign up or log in to comment