akhauriyash/DeepSeek-R1-Distill-Llama-8B-Butler
Feature Extraction
•
8B
•
Updated
•
18
TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity!