EAGLE Head and Performance

by HristoTodorov - opened Jan 21

Jan 21

I’m wondering how is the performance of pruned versions compared to say AWQ quant speaking of pure RAM optimisation to fit hardware. One see two potential benefits:

Preseved EAGLE head for speculative decoding
Native FP8 on Hopper

Can you confirm if your method preserves the EAGLE head for performance benefits?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment