33% pruning on RedPajama 3B linear layers
The pruned layers are:
- attention linear layers (query, key, value computation)
- attention dense layer
- MLP layers
Pruning is done in all decoder modules. Pruning is unstructured magnitude pruning
33% pruning on RedPajama 3B linear layers
The pruned layers are:
Pruning is done in all decoder modules. Pruning is unstructured magnitude pruning