File size: 246 Bytes
5ea922b
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
33% pruning on RedPajama 3B linear layers

The pruned layers are:
1. attention linear layers (query, key, value computation)
2. attention dense layer
3. MLP layers

Pruning is done in all decoder modules. Pruning is unstructured magnitude pruning