Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
s1ghhh 's Collections
PALACE_Predictive_Auditing
CoIn-LLM-Auditing
LLM-Drop

LLM-Drop

updated 4 days ago

Model weights of paper "What Matters in Transformers? Not All Attention is Needed" (https://arxiv.org/abs/2406.15786)

Upvote
-

  • s1ghhh/Llama-2-13b-Drop8Block

    13B • Updated Sep 8, 2024 • 24 • 2

  • s1ghhh/Llama-2-13b-Drop4Block

    13B • Updated Sep 8, 2024 • 4 • 2

  • s1ghhh/Llama-2-13b-Drop4Attn

    13B • Updated Sep 8, 2024 • 4 • 2

  • s1ghhh/Llama-2-13b-Drop8Attn

    13B • Updated Sep 8, 2024 • 4 • 2

  • s1ghhh/Llama-2-13b-Drop4MLP

    13B • Updated Sep 8, 2024 • 4 • 2

  • s1ghhh/Llama-2-13b-Drop8MLP

    13B • Updated Sep 8, 2024 • 2 • 2

  • s1ghhh/Mistral-7B-v0.1-Drop4Block

    7B • Updated Sep 8, 2024 • 4 • 2

  • s1ghhh/Mistral-7B-v0.1-Drop8Block

    7B • Updated Sep 8, 2024 • 3 • 2

  • s1ghhh/Mistral-7B-v0.1-Drop4Attn

    7B • Updated Sep 8, 2024 • 5 • 2

  • s1ghhh/Mistral-7B-v0.1-Drop8Attn

    7B • Updated Sep 8, 2024 • 4 • 2

  • s1ghhh/Mistral-7B-v0.1-Drop4MLP

    7B • Updated Sep 8, 2024 • 2 • 2

  • s1ghhh/Mistral-7B-v0.1-Drop8MLP

    7B • Updated Sep 8, 2024 • 3 • 2

  • s1ghhh/Llama-2-70b-Drop

    Text Generation • Updated Oct 23, 2024 • 3 • 2

  • s1ghhh/Llama-3-70b-Drop

    Text Generation • 71B • Updated Oct 23, 2024 • 16 • 4

  • What Matters in Transformers? Not All Attention is Needed

    Paper • 2406.15786 • Published Jun 22, 2024 • 33
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs