metadata
license: apache-2.0
datasets:
- allenai/tulu-v2-sft-mixture
- xuan-luo/FlexiPatterns-Llama-3-8B-Instruct
language:
- en
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
pipeline_tag: text-generation
library_name: transformers
DiffSkip-Llama-3-8B-Instruct
The implementation of the paper Differential Layer Skipping in Large Language Models.
Model Description
DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.
- Developed by: Xuan Luo, Weizhi Wang, Xifeng Yan
- Model type: Causal Language Model with dynamic FFN skipping
- Language(s) (NLP): English (en)
- License: Apache-2.0
- Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct
Model Card Contact
For questions or inquiries, please contact xuan_luo@ucsb.edu.