33% pruning on RedPajama 3B linear layers The pruned layers are: 1. attention linear layers (query, key, value computation) 2. attention dense layer 3. MLP layers Pruning is done in all decoder modules. Pruning is unstructured magnitude pruning