Advaith28's picture
Create README.md
5ea922b
33% pruning on RedPajama 3B linear layers
The pruned layers are:
1. attention linear layers (query, key, value computation)
2. attention dense layer
3. MLP layers
Pruning is done in all decoder modules. Pruning is unstructured magnitude pruning