Factorized the weight matrix in the GlobalAttentionPoolingHead, thus reducing the number of parameters in this layer by a factor of 48 a1e9f64 PeteBleackley commited on Mar 11, 2024