Update README.md

f63473a verified 11 months ago

1.2 kB

license: apache-2.0
datasets:
  - allenai/tulu-v2-sft-mixture
  - xuan-luo/FlexiPatterns-Llama-3-8B-Instruct
language:
  - en
base_model:
  - meta-llama/Meta-Llama-3-8B-Instruct
pipeline_tag: text-generation
library_name: transformers

DiffSkip-Llama-3-8B-Instruct

The implementation of the paper Differential Layer Skipping in Large Language Models.

Model Description

DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.

Developed by: Xuan Luo, Weizhi Wang, Xifeng Yan
Model type: Causal Language Model with dynamic FFN skipping
Language(s) (NLP): English (en)
License: Apache-2.0
Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct

Model Card Contact

For questions or inquiries, please contact xuan_luo@ucsb.edu.