File size: 246 Bytes
5ea922b |
1 2 3 4 5 6 7 8 |
33% pruning on RedPajama 3B linear layers
The pruned layers are:
1. attention linear layers (query, key, value computation)
2. attention dense layer
3. MLP layers
Pruning is done in all decoder modules. Pruning is unstructured magnitude pruning |