Olsatthe commited on
Commit
6f07f42
·
1 Parent(s): 69311b2

Upload 8 files

Browse files
workspace/T5-3B_RUN2.txt ADDED
@@ -0,0 +1,738 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CUDA extension not installed.
2
+ Some weights of the model checkpoint at t5-3b were not used when initializing T5EncoderModel: ['decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.2.DenseReluDense.wi.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.20.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.15.layer.2.DenseReluDense.wi.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.2.DenseReluDense.wi.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.14.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.2.DenseReluDense.wi.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.2.DenseReluDense.wi.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wi.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.11.layer.2.DenseReluDense.wi.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.2.DenseReluDense.wi.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.2.DenseReluDense.wi.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.12.layer.2.DenseReluDense.wi.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.final_layer_norm.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.2.layer.2.DenseReluDense.wi.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.2.DenseReluDense.wi.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.4.layer.2.DenseReluDense.wi.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight']
3
+ - This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
4
+ - This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
5
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
6
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
7
+ /usr/local/lib/python3.10/dist-packages/transformers/models/t5/tokenization_t5.py:163: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
8
+ For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
9
+ - Be aware that you SHOULD NOT rely on t5-3b automatically truncating your input to 512 when padding/encoding.
10
+ - If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
11
+ - To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
12
+ warnings.warn(
13
+ Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
14
+ Starting ...
15
+ Ready.
16
+ 0 layer.0.SelfAttention.q
17
+ Quantizing ...
18
+ time 1.35
19
+ error 379.2095031738281
20
+ 0 layer.0.SelfAttention.k
21
+ Quantizing ...
22
+ time 0.24
23
+ error 46126.375
24
+ 0 layer.0.SelfAttention.v
25
+ Quantizing ...
26
+ time 0.24
27
+ error 24450.14453125
28
+ 0 layer.0.SelfAttention.o
29
+ Quantizing ...
30
+ time 1.00
31
+ error 44522.7578125
32
+ 0 layer.1.DenseReluDense.wi
33
+ Quantizing ...
34
+ time 0.26
35
+ error 709531.9375
36
+ 0 layer.1.DenseReluDense.wo
37
+ Quantizing ...
38
+ time 4.78
39
+ error 32526.62109375
40
+ 1 layer.0.SelfAttention.q
41
+ Quantizing ...
42
+ time 1.39
43
+ error 142.10263061523438
44
+ 1 layer.0.SelfAttention.k
45
+ Quantizing ...
46
+ time 0.24
47
+ error 14705.056640625
48
+ 1 layer.0.SelfAttention.v
49
+ Quantizing ...
50
+ time 0.25
51
+ error 7253.73046875
52
+ 1 layer.0.SelfAttention.o
53
+ Quantizing ...
54
+ time 1.00
55
+ error 4656.6787109375
56
+ 1 layer.1.DenseReluDense.wi
57
+ Quantizing ...
58
+ time 0.25
59
+ error 817805.625
60
+ 1 layer.1.DenseReluDense.wo
61
+ Quantizing ...
62
+ time 4.83
63
+ error 102973.921875
64
+ 2 layer.0.SelfAttention.q
65
+ Quantizing ...
66
+ time 1.39
67
+ error 201.76727294921875
68
+ 2 layer.0.SelfAttention.k
69
+ Quantizing ...
70
+ time 0.24
71
+ error 22729.0390625
72
+ 2 layer.0.SelfAttention.v
73
+ Quantizing ...
74
+ time 0.24
75
+ error 11655.05078125
76
+ 2 layer.0.SelfAttention.o
77
+ Quantizing ...
78
+ time 1.01
79
+ error 7398.17529296875
80
+ 2 layer.1.DenseReluDense.wi
81
+ Quantizing ...
82
+ time 0.27
83
+ error 1344222.25
84
+ 2 layer.1.DenseReluDense.wo
85
+ Quantizing ...
86
+ time 4.88
87
+ error 120313.578125
88
+ 3 layer.0.SelfAttention.q
89
+ Quantizing ...
90
+ time 1.40
91
+ error 240.1311492919922
92
+ 3 layer.0.SelfAttention.k
93
+ Quantizing ...
94
+ time 0.24
95
+ error 24427.28515625
96
+ 3 layer.0.SelfAttention.v
97
+ Quantizing ...
98
+ time 0.26
99
+ error 12910.115234375
100
+ 3 layer.0.SelfAttention.o
101
+ Quantizing ...
102
+ time 1.00
103
+ error 9308.4921875
104
+ 3 layer.1.DenseReluDense.wi
105
+ Quantizing ...
106
+ time 0.25
107
+ error 2360071.5
108
+ 3 layer.1.DenseReluDense.wo
109
+ Quantizing ...
110
+ time 4.77
111
+ error 143022.65625
112
+ 4 layer.0.SelfAttention.q
113
+ Quantizing ...
114
+ time 1.40
115
+ error 282.9701232910156
116
+ 4 layer.0.SelfAttention.k
117
+ Quantizing ...
118
+ time 0.26
119
+ error 33365.83203125
120
+ 4 layer.0.SelfAttention.v
121
+ Quantizing ...
122
+ time 0.25
123
+ error 16407.8203125
124
+ 4 layer.0.SelfAttention.o
125
+ Quantizing ...
126
+ time 1.01
127
+ error 15296.896484375
128
+ 4 layer.1.DenseReluDense.wi
129
+ Quantizing ...
130
+ time 0.27
131
+ error 3803509.5
132
+ 4 layer.1.DenseReluDense.wo
133
+ Quantizing ...
134
+ time 4.74
135
+ error 156256.96875
136
+ 5 layer.0.SelfAttention.q
137
+ Quantizing ...
138
+ time 1.40
139
+ error 304.42095947265625
140
+ 5 layer.0.SelfAttention.k
141
+ Quantizing ...
142
+ time 0.25
143
+ error 31203.0546875
144
+ 5 layer.0.SelfAttention.v
145
+ Quantizing ...
146
+ time 0.26
147
+ error 17006.921875
148
+ 5 layer.0.SelfAttention.o
149
+ Quantizing ...
150
+ time 1.02
151
+ error 14054.951171875
152
+ 5 layer.1.DenseReluDense.wi
153
+ Quantizing ...
154
+ time 0.25
155
+ error 4938547.0
156
+ 5 layer.1.DenseReluDense.wo
157
+ Quantizing ...
158
+ time 4.70
159
+ error 183307.234375
160
+ 6 layer.0.SelfAttention.q
161
+ Quantizing ...
162
+ time 1.42
163
+ error 299.11724853515625
164
+ 6 layer.0.SelfAttention.k
165
+ Quantizing ...
166
+ time 0.24
167
+ error 35865.5703125
168
+ 6 layer.0.SelfAttention.v
169
+ Quantizing ...
170
+ time 0.24
171
+ error 17129.06640625
172
+ 6 layer.0.SelfAttention.o
173
+ Quantizing ...
174
+ time 1.02
175
+ error 12793.3740234375
176
+ 6 layer.1.DenseReluDense.wi
177
+ Quantizing ...
178
+ time 0.26
179
+ error 7528978.5
180
+ 6 layer.1.DenseReluDense.wo
181
+ Quantizing ...
182
+ time 4.70
183
+ error 201923.0625
184
+ 7 layer.0.SelfAttention.q
185
+ Quantizing ...
186
+ time 1.39
187
+ error 368.124755859375
188
+ 7 layer.0.SelfAttention.k
189
+ Quantizing ...
190
+ time 0.27
191
+ error 44324.9453125
192
+ 7 layer.0.SelfAttention.v
193
+ Quantizing ...
194
+ time 0.25
195
+ error 21733.6484375
196
+ 7 layer.0.SelfAttention.o
197
+ Quantizing ...
198
+ time 0.99
199
+ error 25086.8125
200
+ 7 layer.1.DenseReluDense.wi
201
+ Quantizing ...
202
+ time 0.25
203
+ error 9442284.0
204
+ 7 layer.1.DenseReluDense.wo
205
+ Quantizing ...
206
+ time 4.66
207
+ error 231078.28125
208
+ 8 layer.0.SelfAttention.q
209
+ Quantizing ...
210
+ time 1.40
211
+ error 336.513671875
212
+ 8 layer.0.SelfAttention.k
213
+ Quantizing ...
214
+ time 0.25
215
+ error 40786.26171875
216
+ 8 layer.0.SelfAttention.v
217
+ Quantizing ...
218
+ time 0.24
219
+ error 22459.0078125
220
+ 8 layer.0.SelfAttention.o
221
+ Quantizing ...
222
+ time 0.99
223
+ error 22684.369140625
224
+ 8 layer.1.DenseReluDense.wi
225
+ Quantizing ...
226
+ time 0.25
227
+ error 11038062.0
228
+ 8 layer.1.DenseReluDense.wo
229
+ Quantizing ...
230
+ time 4.66
231
+ error 358261.84375
232
+ 9 layer.0.SelfAttention.q
233
+ Quantizing ...
234
+ time 1.39
235
+ error 356.87689208984375
236
+ 9 layer.0.SelfAttention.k
237
+ Quantizing ...
238
+ time 0.24
239
+ error 43993.4375
240
+ 9 layer.0.SelfAttention.v
241
+ Quantizing ...
242
+ time 0.24
243
+ error 26483.703125
244
+ 9 layer.0.SelfAttention.o
245
+ Quantizing ...
246
+ time 0.98
247
+ error 68000.96875
248
+ 9 layer.1.DenseReluDense.wi
249
+ Quantizing ...
250
+ time 0.25
251
+ error 12831236.0
252
+ 9 layer.1.DenseReluDense.wo
253
+ Quantizing ...
254
+ time 4.67
255
+ error 329604.78125
256
+ 10 layer.0.SelfAttention.q
257
+ Quantizing ...
258
+ time 1.39
259
+ error 360.63385009765625
260
+ 10 layer.0.SelfAttention.k
261
+ Quantizing ...
262
+ time 0.24
263
+ error 44677.66015625
264
+ 10 layer.0.SelfAttention.v
265
+ Quantizing ...
266
+ time 0.24
267
+ error 28456.794921875
268
+ 10 layer.0.SelfAttention.o
269
+ Quantizing ...
270
+ time 0.99
271
+ error 66670.6953125
272
+ 10 layer.1.DenseReluDense.wi
273
+ Quantizing ...
274
+ time 0.25
275
+ error 14097091.0
276
+ 10 layer.1.DenseReluDense.wo
277
+ Quantizing ...
278
+ time 4.68
279
+ error 396505.9375
280
+ 11 layer.0.SelfAttention.q
281
+ Quantizing ...
282
+ time 1.39
283
+ error 353.4673767089844
284
+ 11 layer.0.SelfAttention.k
285
+ Quantizing ...
286
+ time 0.24
287
+ error 42337.890625
288
+ 11 layer.0.SelfAttention.v
289
+ Quantizing ...
290
+ time 0.24
291
+ error 41291.625
292
+ 11 layer.0.SelfAttention.o
293
+ Quantizing ...
294
+ time 0.99
295
+ error 84161.796875
296
+ 11 layer.1.DenseReluDense.wi
297
+ Quantizing ...
298
+ time 0.25
299
+ error 13223532.0
300
+ 11 layer.1.DenseReluDense.wo
301
+ Quantizing ...
302
+ time 4.70
303
+ error 527305.5625
304
+ 12 layer.0.SelfAttention.q
305
+ Quantizing ...
306
+ time 1.39
307
+ error 352.1868896484375
308
+ 12 layer.0.SelfAttention.k
309
+ Quantizing ...
310
+ time 0.24
311
+ error 45228.03515625
312
+ 12 layer.0.SelfAttention.v
313
+ Quantizing ...
314
+ time 0.24
315
+ error 49482.1328125
316
+ 12 layer.0.SelfAttention.o
317
+ Quantizing ...
318
+ time 0.98
319
+ error 166233.6875
320
+ 12 layer.1.DenseReluDense.wi
321
+ Quantizing ...
322
+ time 0.25
323
+ error 12493772.0
324
+ 12 layer.1.DenseReluDense.wo
325
+ Quantizing ...
326
+ time 4.69
327
+ error 702293.9375
328
+ 13 layer.0.SelfAttention.q
329
+ Quantizing ...
330
+ time 1.39
331
+ error 334.15252685546875
332
+ 13 layer.0.SelfAttention.k
333
+ Quantizing ...
334
+ time 0.24
335
+ error 43450.84765625
336
+ 13 layer.0.SelfAttention.v
337
+ Quantizing ...
338
+ time 0.24
339
+ error 60685.6875
340
+ 13 layer.0.SelfAttention.o
341
+ Quantizing ...
342
+ time 0.98
343
+ error 237831.390625
344
+ 13 layer.1.DenseReluDense.wi
345
+ Quantizing ...
346
+ time 0.25
347
+ error 17085658.0
348
+ 13 layer.1.DenseReluDense.wo
349
+ Quantizing ...
350
+ time 4.77
351
+ error 1149340.5
352
+ 14 layer.0.SelfAttention.q
353
+ Quantizing ...
354
+ time 1.39
355
+ error 307.14837646484375
356
+ 14 layer.0.SelfAttention.k
357
+ Quantizing ...
358
+ time 0.24
359
+ error 37913.44140625
360
+ 14 layer.0.SelfAttention.v
361
+ Quantizing ...
362
+ time 0.24
363
+ error 70616.703125
364
+ 14 layer.0.SelfAttention.o
365
+ Quantizing ...
366
+ time 0.98
367
+ error 276008.25
368
+ 14 layer.1.DenseReluDense.wi
369
+ Quantizing ...
370
+ time 0.25
371
+ error 18912372.0
372
+ 14 layer.1.DenseReluDense.wo
373
+ Quantizing ...
374
+ time 4.81
375
+ error 1235969.25
376
+ 15 layer.0.SelfAttention.q
377
+ Quantizing ...
378
+ time 1.39
379
+ error 248.17747497558594
380
+ 15 layer.0.SelfAttention.k
381
+ Quantizing ...
382
+ time 0.24
383
+ error 38016.640625
384
+ 15 layer.0.SelfAttention.v
385
+ Quantizing ...
386
+ time 0.24
387
+ error 91188.5
388
+ 15 layer.0.SelfAttention.o
389
+ Quantizing ...
390
+ time 1.00
391
+ error 444728.9375
392
+ 15 layer.1.DenseReluDense.wi
393
+ Quantizing ...
394
+ time 0.25
395
+ error 25090036.0
396
+ 15 layer.1.DenseReluDense.wo
397
+ Quantizing ...
398
+ time 4.78
399
+ error 2290796.5
400
+ 16 layer.0.SelfAttention.q
401
+ Quantizing ...
402
+ time 1.41
403
+ error 292.78265380859375
404
+ 16 layer.0.SelfAttention.k
405
+ Quantizing ...
406
+ time 0.24
407
+ error 37744.9765625
408
+ 16 layer.0.SelfAttention.v
409
+ Quantizing ...
410
+ time 0.27
411
+ error 111741.5625
412
+ 16 layer.0.SelfAttention.o
413
+ Quantizing ...
414
+ time 1.02
415
+ error 623461.5625
416
+ 16 layer.1.DenseReluDense.wi
417
+ Quantizing ...
418
+ time 0.25
419
+ error 32498636.0
420
+ 16 layer.1.DenseReluDense.wo
421
+ Quantizing ...
422
+ time 4.78
423
+ error 2876735.0
424
+ 17 layer.0.SelfAttention.q
425
+ Quantizing ...
426
+ time 1.42
427
+ error 238.4019775390625
428
+ 17 layer.0.SelfAttention.k
429
+ Quantizing ...
430
+ time 0.25
431
+ error 36026.7890625
432
+ 17 layer.0.SelfAttention.v
433
+ Quantizing ...
434
+ time 0.25
435
+ error 133311.40625
436
+ 17 layer.0.SelfAttention.o
437
+ Quantizing ...
438
+ time 1.02
439
+ error 775721.0
440
+ 17 layer.1.DenseReluDense.wi
441
+ Quantizing ...
442
+ time 0.26
443
+ error 29635048.0
444
+ 17 layer.1.DenseReluDense.wo
445
+ Quantizing ...
446
+ time 4.72
447
+ error 4939297.0
448
+ 18 layer.0.SelfAttention.q
449
+ Quantizing ...
450
+ time 1.40
451
+ error 264.264892578125
452
+ 18 layer.0.SelfAttention.k
453
+ Quantizing ...
454
+ time 0.25
455
+ error 35441.94140625
456
+ 18 layer.0.SelfAttention.v
457
+ Quantizing ...
458
+ time 0.28
459
+ error 173245.75
460
+ 18 layer.0.SelfAttention.o
461
+ Quantizing ...
462
+ time 1.03
463
+ error 1960626.25
464
+ 18 layer.1.DenseReluDense.wi
465
+ Quantizing ...
466
+ time 0.25
467
+ error 35718256.0
468
+ 18 layer.1.DenseReluDense.wo
469
+ Quantizing ...
470
+ time 4.69
471
+ error 8303653.0
472
+ 19 layer.0.SelfAttention.q
473
+ Quantizing ...
474
+ time 1.41
475
+ error 208.0140380859375
476
+ 19 layer.0.SelfAttention.k
477
+ Quantizing ...
478
+ time 0.24
479
+ error 29667.7890625
480
+ 19 layer.0.SelfAttention.v
481
+ Quantizing ...
482
+ time 0.25
483
+ error 186044.875
484
+ 19 layer.0.SelfAttention.o
485
+ Quantizing ...
486
+ time 1.04
487
+ error 1691559.75
488
+ 19 layer.1.DenseReluDense.wi
489
+ Quantizing ...
490
+ time 0.25
491
+ error 35222308.0
492
+ 19 layer.1.DenseReluDense.wo
493
+ Quantizing ...
494
+ time 4.69
495
+ error 7108630.0
496
+ 20 layer.0.SelfAttention.q
497
+ Quantizing ...
498
+ time 1.41
499
+ error 153.36215209960938
500
+ 20 layer.0.SelfAttention.k
501
+ Quantizing ...
502
+ time 0.26
503
+ error 22485.923828125
504
+ 20 layer.0.SelfAttention.v
505
+ Quantizing ...
506
+ time 0.24
507
+ error 193863.65625
508
+ 20 layer.0.SelfAttention.o
509
+ Quantizing ...
510
+ time 0.98
511
+ error 2213693.75
512
+ 20 layer.1.DenseReluDense.wi
513
+ Quantizing ...
514
+ time 0.25
515
+ error 44203168.0
516
+ 20 layer.1.DenseReluDense.wo
517
+ Quantizing ...
518
+ time 4.66
519
+ error 9345712.0
520
+ 21 layer.0.SelfAttention.q
521
+ Quantizing ...
522
+ time 1.39
523
+ error 179.65872192382812
524
+ 21 layer.0.SelfAttention.k
525
+ Quantizing ...
526
+ time 0.24
527
+ error 23743.3984375
528
+ 21 layer.0.SelfAttention.v
529
+ Quantizing ...
530
+ time 0.24
531
+ error 237300.96875
532
+ 21 layer.0.SelfAttention.o
533
+ Quantizing ...
534
+ time 0.99
535
+ error 3179711.0
536
+ 21 layer.1.DenseReluDense.wi
537
+ Quantizing ...
538
+ time 0.25
539
+ error 66251440.0
540
+ 21 layer.1.DenseReluDense.wo
541
+ Quantizing ...
542
+ time 4.69
543
+ error 30768120.0
544
+ 22 layer.0.SelfAttention.q
545
+ Quantizing ...
546
+ time 1.39
547
+ error 73.71006774902344
548
+ 22 layer.0.SelfAttention.k
549
+ Quantizing ...
550
+ time 0.24
551
+ error 10168.076171875
552
+ 22 layer.0.SelfAttention.v
553
+ Quantizing ...
554
+ time 0.24
555
+ error 131254.0
556
+ 22 layer.0.SelfAttention.o
557
+ Quantizing ...
558
+ time 0.99
559
+ error 1327100.625
560
+ 22 layer.1.DenseReluDense.wi
561
+ Quantizing ...
562
+ time 0.25
563
+ error 40279020.0
564
+ 22 layer.1.DenseReluDense.wo
565
+ Quantizing ...
566
+ time 4.67
567
+ error 90908576.0
568
+ 23 layer.0.SelfAttention.q
569
+ Quantizing ...
570
+ time 1.40
571
+ error 84.36131286621094
572
+ 23 layer.0.SelfAttention.k
573
+ Quantizing ...
574
+ time 0.24
575
+ error 11834.87109375
576
+ 23 layer.0.SelfAttention.v
577
+ Quantizing ...
578
+ time 0.24
579
+ error 154102.96875
580
+ 23 layer.0.SelfAttention.o
581
+ Quantizing ...
582
+ time 1.00
583
+ error 2506505.5
584
+ 23 layer.1.DenseReluDense.wi
585
+ Quantizing ...
586
+ time 0.25
587
+ error 23018948.0
588
+ 23 layer.1.DenseReluDense.wo
589
+ Quantizing ...
590
+ time 4.65
591
+ error 38312456.0
592
+ 443.7025671005249
593
+ Packing ...
594
+ encoder.block.0.layer.0.SelfAttention.q
595
+ encoder.block.0.layer.0.SelfAttention.k
596
+ encoder.block.0.layer.0.SelfAttention.v
597
+ encoder.block.0.layer.0.SelfAttention.o
598
+ encoder.block.0.layer.1.DenseReluDense.wi
599
+ encoder.block.0.layer.1.DenseReluDense.wo
600
+ encoder.block.1.layer.0.SelfAttention.q
601
+ encoder.block.1.layer.0.SelfAttention.k
602
+ encoder.block.1.layer.0.SelfAttention.v
603
+ encoder.block.1.layer.0.SelfAttention.o
604
+ encoder.block.1.layer.1.DenseReluDense.wi
605
+ encoder.block.1.layer.1.DenseReluDense.wo
606
+ encoder.block.2.layer.0.SelfAttention.q
607
+ encoder.block.2.layer.0.SelfAttention.k
608
+ encoder.block.2.layer.0.SelfAttention.v
609
+ encoder.block.2.layer.0.SelfAttention.o
610
+ encoder.block.2.layer.1.DenseReluDense.wi
611
+ encoder.block.2.layer.1.DenseReluDense.wo
612
+ encoder.block.3.layer.0.SelfAttention.q
613
+ encoder.block.3.layer.0.SelfAttention.k
614
+ encoder.block.3.layer.0.SelfAttention.v
615
+ encoder.block.3.layer.0.SelfAttention.o
616
+ encoder.block.3.layer.1.DenseReluDense.wi
617
+ encoder.block.3.layer.1.DenseReluDense.wo
618
+ encoder.block.4.layer.0.SelfAttention.q
619
+ encoder.block.4.layer.0.SelfAttention.k
620
+ encoder.block.4.layer.0.SelfAttention.v
621
+ encoder.block.4.layer.0.SelfAttention.o
622
+ encoder.block.4.layer.1.DenseReluDense.wi
623
+ encoder.block.4.layer.1.DenseReluDense.wo
624
+ encoder.block.5.layer.0.SelfAttention.q
625
+ encoder.block.5.layer.0.SelfAttention.k
626
+ encoder.block.5.layer.0.SelfAttention.v
627
+ encoder.block.5.layer.0.SelfAttention.o
628
+ encoder.block.5.layer.1.DenseReluDense.wi
629
+ encoder.block.5.layer.1.DenseReluDense.wo
630
+ encoder.block.6.layer.0.SelfAttention.q
631
+ encoder.block.6.layer.0.SelfAttention.k
632
+ encoder.block.6.layer.0.SelfAttention.v
633
+ encoder.block.6.layer.0.SelfAttention.o
634
+ encoder.block.6.layer.1.DenseReluDense.wi
635
+ encoder.block.6.layer.1.DenseReluDense.wo
636
+ encoder.block.7.layer.0.SelfAttention.q
637
+ encoder.block.7.layer.0.SelfAttention.k
638
+ encoder.block.7.layer.0.SelfAttention.v
639
+ encoder.block.7.layer.0.SelfAttention.o
640
+ encoder.block.7.layer.1.DenseReluDense.wi
641
+ encoder.block.7.layer.1.DenseReluDense.wo
642
+ encoder.block.8.layer.0.SelfAttention.q
643
+ encoder.block.8.layer.0.SelfAttention.k
644
+ encoder.block.8.layer.0.SelfAttention.v
645
+ encoder.block.8.layer.0.SelfAttention.o
646
+ encoder.block.8.layer.1.DenseReluDense.wi
647
+ encoder.block.8.layer.1.DenseReluDense.wo
648
+ encoder.block.9.layer.0.SelfAttention.q
649
+ encoder.block.9.layer.0.SelfAttention.k
650
+ encoder.block.9.layer.0.SelfAttention.v
651
+ encoder.block.9.layer.0.SelfAttention.o
652
+ encoder.block.9.layer.1.DenseReluDense.wi
653
+ encoder.block.9.layer.1.DenseReluDense.wo
654
+ encoder.block.10.layer.0.SelfAttention.q
655
+ encoder.block.10.layer.0.SelfAttention.k
656
+ encoder.block.10.layer.0.SelfAttention.v
657
+ encoder.block.10.layer.0.SelfAttention.o
658
+ encoder.block.10.layer.1.DenseReluDense.wi
659
+ encoder.block.10.layer.1.DenseReluDense.wo
660
+ encoder.block.11.layer.0.SelfAttention.q
661
+ encoder.block.11.layer.0.SelfAttention.k
662
+ encoder.block.11.layer.0.SelfAttention.v
663
+ encoder.block.11.layer.0.SelfAttention.o
664
+ encoder.block.11.layer.1.DenseReluDense.wi
665
+ encoder.block.11.layer.1.DenseReluDense.wo
666
+ encoder.block.12.layer.0.SelfAttention.q
667
+ encoder.block.12.layer.0.SelfAttention.k
668
+ encoder.block.12.layer.0.SelfAttention.v
669
+ encoder.block.12.layer.0.SelfAttention.o
670
+ encoder.block.12.layer.1.DenseReluDense.wi
671
+ encoder.block.12.layer.1.DenseReluDense.wo
672
+ encoder.block.13.layer.0.SelfAttention.q
673
+ encoder.block.13.layer.0.SelfAttention.k
674
+ encoder.block.13.layer.0.SelfAttention.v
675
+ encoder.block.13.layer.0.SelfAttention.o
676
+ encoder.block.13.layer.1.DenseReluDense.wi
677
+ encoder.block.13.layer.1.DenseReluDense.wo
678
+ encoder.block.14.layer.0.SelfAttention.q
679
+ encoder.block.14.layer.0.SelfAttention.k
680
+ encoder.block.14.layer.0.SelfAttention.v
681
+ encoder.block.14.layer.0.SelfAttention.o
682
+ encoder.block.14.layer.1.DenseReluDense.wi
683
+ encoder.block.14.layer.1.DenseReluDense.wo
684
+ encoder.block.15.layer.0.SelfAttention.q
685
+ encoder.block.15.layer.0.SelfAttention.k
686
+ encoder.block.15.layer.0.SelfAttention.v
687
+ encoder.block.15.layer.0.SelfAttention.o
688
+ encoder.block.15.layer.1.DenseReluDense.wi
689
+ encoder.block.15.layer.1.DenseReluDense.wo
690
+ encoder.block.16.layer.0.SelfAttention.q
691
+ encoder.block.16.layer.0.SelfAttention.k
692
+ encoder.block.16.layer.0.SelfAttention.v
693
+ encoder.block.16.layer.0.SelfAttention.o
694
+ encoder.block.16.layer.1.DenseReluDense.wi
695
+ encoder.block.16.layer.1.DenseReluDense.wo
696
+ encoder.block.17.layer.0.SelfAttention.q
697
+ encoder.block.17.layer.0.SelfAttention.k
698
+ encoder.block.17.layer.0.SelfAttention.v
699
+ encoder.block.17.layer.0.SelfAttention.o
700
+ encoder.block.17.layer.1.DenseReluDense.wi
701
+ encoder.block.17.layer.1.DenseReluDense.wo
702
+ encoder.block.18.layer.0.SelfAttention.q
703
+ encoder.block.18.layer.0.SelfAttention.k
704
+ encoder.block.18.layer.0.SelfAttention.v
705
+ encoder.block.18.layer.0.SelfAttention.o
706
+ encoder.block.18.layer.1.DenseReluDense.wi
707
+ encoder.block.18.layer.1.DenseReluDense.wo
708
+ encoder.block.19.layer.0.SelfAttention.q
709
+ encoder.block.19.layer.0.SelfAttention.k
710
+ encoder.block.19.layer.0.SelfAttention.v
711
+ encoder.block.19.layer.0.SelfAttention.o
712
+ encoder.block.19.layer.1.DenseReluDense.wi
713
+ encoder.block.19.layer.1.DenseReluDense.wo
714
+ encoder.block.20.layer.0.SelfAttention.q
715
+ encoder.block.20.layer.0.SelfAttention.k
716
+ encoder.block.20.layer.0.SelfAttention.v
717
+ encoder.block.20.layer.0.SelfAttention.o
718
+ encoder.block.20.layer.1.DenseReluDense.wi
719
+ encoder.block.20.layer.1.DenseReluDense.wo
720
+ encoder.block.21.layer.0.SelfAttention.q
721
+ encoder.block.21.layer.0.SelfAttention.k
722
+ encoder.block.21.layer.0.SelfAttention.v
723
+ encoder.block.21.layer.0.SelfAttention.o
724
+ encoder.block.21.layer.1.DenseReluDense.wi
725
+ encoder.block.21.layer.1.DenseReluDense.wo
726
+ encoder.block.22.layer.0.SelfAttention.q
727
+ encoder.block.22.layer.0.SelfAttention.k
728
+ encoder.block.22.layer.0.SelfAttention.v
729
+ encoder.block.22.layer.0.SelfAttention.o
730
+ encoder.block.22.layer.1.DenseReluDense.wi
731
+ encoder.block.22.layer.1.DenseReluDense.wo
732
+ encoder.block.23.layer.0.SelfAttention.q
733
+ encoder.block.23.layer.0.SelfAttention.k
734
+ encoder.block.23.layer.0.SelfAttention.v
735
+ encoder.block.23.layer.0.SelfAttention.o
736
+ encoder.block.23.layer.1.DenseReluDense.wi
737
+ encoder.block.23.layer.1.DenseReluDense.wo
738
+ Done.
workspace/flan-ts-base.txt ADDED
@@ -0,0 +1,437 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CUDA extension not installed.
2
+ Downloading (��)lve/main/config.json: 100%|��| 1.40k/1.40k [00:00<00:00, 3.76MB/s]
3
+ Downloading pytorch_model.bin: 100%|������������������| 990M/990M [00:11<00:00, 89.8MB/s]
4
+ Some weights of the model checkpoint at google/flan-t5-base were not used when initializing T5EncoderModel: ['decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.embed_tokens.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_1.weight', 'decoder.final_layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'lm_head.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight']
5
+ - This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
6
+ - This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
7
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
8
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
9
+ Downloading (��)okenizer_config.json: 100%|��| 2.54k/2.54k [00:00<00:00, 9.34MB/s]
10
+ Downloading spiece.model: 100%|����������������������������| 792k/792k [00:00<00:00, 26.2MB/s]
11
+ Downloading (��)cial_tokens_map.json: 100%|��| 2.20k/2.20k [00:00<00:00, 8.32MB/s]
12
+ Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
13
+ Starting ...
14
+ Ready.
15
+ 0 layer.0.SelfAttention.q
16
+ Quantizing ...
17
+ time 0.52
18
+ error 146.0775604248047
19
+ 0 layer.0.SelfAttention.k
20
+ Quantizing ...
21
+ time 0.26
22
+ error 10098.515625
23
+ 0 layer.0.SelfAttention.v
24
+ Quantizing ...
25
+ time 0.26
26
+ error 2831.77734375
27
+ 0 layer.0.SelfAttention.o
28
+ Quantizing ...
29
+ time 0.28
30
+ error 169348.390625
31
+ 0 layer.1.DenseReluDense.wi_0
32
+ Quantizing ...
33
+ time 0.27
34
+ error 13075.279296875
35
+ 0 layer.1.DenseReluDense.wi_1
36
+ Quantizing ...
37
+ time 0.27
38
+ error 13343.080078125
39
+ 0 layer.1.DenseReluDense.wo
40
+ Quantizing ...
41
+ time 0.71
42
+ error 223388.6875
43
+ 1 layer.0.SelfAttention.q
44
+ Quantizing ...
45
+ time 0.35
46
+ error 152.99575805664062
47
+ 1 layer.0.SelfAttention.k
48
+ Quantizing ...
49
+ time 0.26
50
+ error 9350.123046875
51
+ 1 layer.0.SelfAttention.v
52
+ Quantizing ...
53
+ time 0.26
54
+ error 2726.8740234375
55
+ 1 layer.0.SelfAttention.o
56
+ Quantizing ...
57
+ time 0.26
58
+ error 46640.0390625
59
+ 1 layer.1.DenseReluDense.wi_0
60
+ Quantizing ...
61
+ time 0.26
62
+ error 14291.783203125
63
+ 1 layer.1.DenseReluDense.wi_1
64
+ Quantizing ...
65
+ time 0.26
66
+ error 15036.92578125
67
+ 1 layer.1.DenseReluDense.wo
68
+ Quantizing ...
69
+ time 0.69
70
+ error 69473232.0
71
+ 2 layer.0.SelfAttention.q
72
+ Quantizing ...
73
+ time 0.35
74
+ error 150.92425537109375
75
+ 2 layer.0.SelfAttention.k
76
+ Quantizing ...
77
+ time 0.26
78
+ error 8416.51171875
79
+ 2 layer.0.SelfAttention.v
80
+ Quantizing ...
81
+ time 0.26
82
+ error 4465.57470703125
83
+ 2 layer.0.SelfAttention.o
84
+ Quantizing ...
85
+ time 0.26
86
+ error 21976.58984375
87
+ 2 layer.1.DenseReluDense.wi_0
88
+ Quantizing ...
89
+ time 0.26
90
+ error 10551.6982421875
91
+ 2 layer.1.DenseReluDense.wi_1
92
+ Quantizing ...
93
+ time 0.26
94
+ error 27373.30078125
95
+ 2 layer.1.DenseReluDense.wo
96
+ Quantizing ...
97
+ time 0.69
98
+ error 2248954.5
99
+ 3 layer.0.SelfAttention.q
100
+ Quantizing ...
101
+ time 0.35
102
+ error 112.25468444824219
103
+ 3 layer.0.SelfAttention.k
104
+ Quantizing ...
105
+ time 0.26
106
+ error 6374.623046875
107
+ 3 layer.0.SelfAttention.v
108
+ Quantizing ...
109
+ time 0.28
110
+ error 6320.84765625
111
+ 3 layer.0.SelfAttention.o
112
+ Quantizing ...
113
+ time 0.26
114
+ error 39145.75
115
+ 3 layer.1.DenseReluDense.wi_0
116
+ Quantizing ...
117
+ time 0.27
118
+ error 8442.021484375
119
+ 3 layer.1.DenseReluDense.wi_1
120
+ Quantizing ...
121
+ time 0.28
122
+ error 16249.869140625
123
+ 3 layer.1.DenseReluDense.wo
124
+ Quantizing ...
125
+ time 0.69
126
+ error 413714.4375
127
+ 4 layer.0.SelfAttention.q
128
+ Quantizing ...
129
+ time 0.37
130
+ error 106.27033996582031
131
+ 4 layer.0.SelfAttention.k
132
+ Quantizing ...
133
+ time 0.26
134
+ error 5648.173828125
135
+ 4 layer.0.SelfAttention.v
136
+ Quantizing ...
137
+ time 0.27
138
+ error 8460.7470703125
139
+ 4 layer.0.SelfAttention.o
140
+ Quantizing ...
141
+ time 0.28
142
+ error 31233.669921875
143
+ 4 layer.1.DenseReluDense.wi_0
144
+ Quantizing ...
145
+ time 0.26
146
+ error 7027.775390625
147
+ 4 layer.1.DenseReluDense.wi_1
148
+ Quantizing ...
149
+ time 0.26
150
+ error 17504.884765625
151
+ 4 layer.1.DenseReluDense.wo
152
+ Quantizing ...
153
+ time 0.71
154
+ error 615970.0
155
+ 5 layer.0.SelfAttention.q
156
+ Quantizing ...
157
+ time 0.35
158
+ error 82.1971206665039
159
+ 5 layer.0.SelfAttention.k
160
+ Quantizing ...
161
+ time 0.26
162
+ error 5029.64208984375
163
+ 5 layer.0.SelfAttention.v
164
+ Quantizing ...
165
+ time 0.26
166
+ error 9543.3857421875
167
+ 5 layer.0.SelfAttention.o
168
+ Quantizing ...
169
+ time 0.26
170
+ error 165621.421875
171
+ 5 layer.1.DenseReluDense.wi_0
172
+ Quantizing ...
173
+ time 0.26
174
+ error 5663.072265625
175
+ 5 layer.1.DenseReluDense.wi_1
176
+ Quantizing ...
177
+ time 0.26
178
+ error 19491.24609375
179
+ 5 layer.1.DenseReluDense.wo
180
+ Quantizing ...
181
+ time 0.69
182
+ error 758556.875
183
+ 6 layer.0.SelfAttention.q
184
+ Quantizing ...
185
+ time 0.35
186
+ error 69.80455780029297
187
+ 6 layer.0.SelfAttention.k
188
+ Quantizing ...
189
+ time 0.26
190
+ error 4538.78271484375
191
+ 6 layer.0.SelfAttention.v
192
+ Quantizing ...
193
+ time 0.26
194
+ error 12373.19921875
195
+ 6 layer.0.SelfAttention.o
196
+ Quantizing ...
197
+ time 0.26
198
+ error 206647.40625
199
+ 6 layer.1.DenseReluDense.wi_0
200
+ Quantizing ...
201
+ time 0.26
202
+ error 5316.70654296875
203
+ 6 layer.1.DenseReluDense.wi_1
204
+ Quantizing ...
205
+ time 0.26
206
+ error 22911.0390625
207
+ 6 layer.1.DenseReluDense.wo
208
+ Quantizing ...
209
+ time 0.69
210
+ error 874569.5
211
+ 7 layer.0.SelfAttention.q
212
+ Quantizing ...
213
+ time 0.35
214
+ error 61.30769348144531
215
+ 7 layer.0.SelfAttention.k
216
+ Quantizing ...
217
+ time 0.26
218
+ error 3534.55078125
219
+ 7 layer.0.SelfAttention.v
220
+ Quantizing ...
221
+ time 0.26
222
+ error 14965.638671875
223
+ 7 layer.0.SelfAttention.o
224
+ Quantizing ...
225
+ time 0.26
226
+ error 120621.015625
227
+ 7 layer.1.DenseReluDense.wi_0
228
+ Quantizing ...
229
+ time 0.26
230
+ error 4825.25634765625
231
+ 7 layer.1.DenseReluDense.wi_1
232
+ Quantizing ...
233
+ time 0.26
234
+ error 23851.55078125
235
+ 7 layer.1.DenseReluDense.wo
236
+ Quantizing ...
237
+ time 0.71
238
+ error 1010260.9375
239
+ 8 layer.0.SelfAttention.q
240
+ Quantizing ...
241
+ time 0.36
242
+ error 67.33954620361328
243
+ 8 layer.0.SelfAttention.k
244
+ Quantizing ...
245
+ time 0.27
246
+ error 3172.860595703125
247
+ 8 layer.0.SelfAttention.v
248
+ Quantizing ...
249
+ time 0.27
250
+ error 22393.306640625
251
+ 8 layer.0.SelfAttention.o
252
+ Quantizing ...
253
+ time 0.26
254
+ error 295393.03125
255
+ 8 layer.1.DenseReluDense.wi_0
256
+ Quantizing ...
257
+ time 0.26
258
+ error 4726.32470703125
259
+ 8 layer.1.DenseReluDense.wi_1
260
+ Quantizing ...
261
+ time 0.27
262
+ error 32944.5
263
+ 8 layer.1.DenseReluDense.wo
264
+ Quantizing ...
265
+ time 0.72
266
+ error 120079864.0
267
+ 9 layer.0.SelfAttention.q
268
+ Quantizing ...
269
+ time 0.39
270
+ error 64.98255920410156
271
+ 9 layer.0.SelfAttention.k
272
+ Quantizing ...
273
+ time 0.26
274
+ error 3637.16455078125
275
+ 9 layer.0.SelfAttention.v
276
+ Quantizing ...
277
+ time 0.26
278
+ error 25351.625
279
+ 9 layer.0.SelfAttention.o
280
+ Quantizing ...
281
+ time 0.26
282
+ error 810347.9375
283
+ 9 layer.1.DenseReluDense.wi_0
284
+ Quantizing ...
285
+ time 0.26
286
+ error 4957.8681640625
287
+ 9 layer.1.DenseReluDense.wi_1
288
+ Quantizing ...
289
+ time 0.26
290
+ error 38014.75390625
291
+ 9 layer.1.DenseReluDense.wo
292
+ Quantizing ...
293
+ time 0.70
294
+ error 2600309.75
295
+ 10 layer.0.SelfAttention.q
296
+ Quantizing ...
297
+ time 0.35
298
+ error 48.993934631347656
299
+ 10 layer.0.SelfAttention.k
300
+ Quantizing ...
301
+ time 0.26
302
+ error 2914.375
303
+ 10 layer.0.SelfAttention.v
304
+ Quantizing ...
305
+ time 0.26
306
+ error 26259.44140625
307
+ 10 layer.0.SelfAttention.o
308
+ Quantizing ...
309
+ time 0.26
310
+ error 1072011.75
311
+ 10 layer.1.DenseReluDense.wi_0
312
+ Quantizing ...
313
+ time 0.26
314
+ error 4582.3369140625
315
+ 10 layer.1.DenseReluDense.wi_1
316
+ Quantizing ...
317
+ time 0.26
318
+ error 42805.3125
319
+ 10 layer.1.DenseReluDense.wo
320
+ Quantizing ...
321
+ time 0.69
322
+ error 14100471.0
323
+ 11 layer.0.SelfAttention.q
324
+ Quantizing ...
325
+ time 0.35
326
+ error 56.52388000488281
327
+ 11 layer.0.SelfAttention.k
328
+ Quantizing ...
329
+ time 0.26
330
+ error 2580.15380859375
331
+ 11 layer.0.SelfAttention.v
332
+ Quantizing ...
333
+ time 0.26
334
+ error 32459.890625
335
+ 11 layer.0.SelfAttention.o
336
+ Quantizing ...
337
+ time 0.26
338
+ error 1562133.25
339
+ 11 layer.1.DenseReluDense.wi_0
340
+ Quantizing ...
341
+ time 0.26
342
+ error 5719.791015625
343
+ 11 layer.1.DenseReluDense.wi_1
344
+ Quantizing ...
345
+ time 0.26
346
+ error 70109.4296875
347
+ 11 layer.1.DenseReluDense.wo
348
+ Quantizing ...
349
+ time 0.69
350
+ error 325791776.0
351
+ 49.135838985443115
352
+ Packing ...
353
+ encoder.block.0.layer.0.SelfAttention.q
354
+ encoder.block.0.layer.0.SelfAttention.k
355
+ encoder.block.0.layer.0.SelfAttention.v
356
+ encoder.block.0.layer.0.SelfAttention.o
357
+ encoder.block.0.layer.1.DenseReluDense.wi_0
358
+ encoder.block.0.layer.1.DenseReluDense.wi_1
359
+ encoder.block.0.layer.1.DenseReluDense.wo
360
+ encoder.block.1.layer.0.SelfAttention.q
361
+ encoder.block.1.layer.0.SelfAttention.k
362
+ encoder.block.1.layer.0.SelfAttention.v
363
+ encoder.block.1.layer.0.SelfAttention.o
364
+ encoder.block.1.layer.1.DenseReluDense.wi_0
365
+ encoder.block.1.layer.1.DenseReluDense.wi_1
366
+ encoder.block.1.layer.1.DenseReluDense.wo
367
+ encoder.block.2.layer.0.SelfAttention.q
368
+ encoder.block.2.layer.0.SelfAttention.k
369
+ encoder.block.2.layer.0.SelfAttention.v
370
+ encoder.block.2.layer.0.SelfAttention.o
371
+ encoder.block.2.layer.1.DenseReluDense.wi_0
372
+ encoder.block.2.layer.1.DenseReluDense.wi_1
373
+ encoder.block.2.layer.1.DenseReluDense.wo
374
+ encoder.block.3.layer.0.SelfAttention.q
375
+ encoder.block.3.layer.0.SelfAttention.k
376
+ encoder.block.3.layer.0.SelfAttention.v
377
+ encoder.block.3.layer.0.SelfAttention.o
378
+ encoder.block.3.layer.1.DenseReluDense.wi_0
379
+ encoder.block.3.layer.1.DenseReluDense.wi_1
380
+ encoder.block.3.layer.1.DenseReluDense.wo
381
+ encoder.block.4.layer.0.SelfAttention.q
382
+ encoder.block.4.layer.0.SelfAttention.k
383
+ encoder.block.4.layer.0.SelfAttention.v
384
+ encoder.block.4.layer.0.SelfAttention.o
385
+ encoder.block.4.layer.1.DenseReluDense.wi_0
386
+ encoder.block.4.layer.1.DenseReluDense.wi_1
387
+ encoder.block.4.layer.1.DenseReluDense.wo
388
+ encoder.block.5.layer.0.SelfAttention.q
389
+ encoder.block.5.layer.0.SelfAttention.k
390
+ encoder.block.5.layer.0.SelfAttention.v
391
+ encoder.block.5.layer.0.SelfAttention.o
392
+ encoder.block.5.layer.1.DenseReluDense.wi_0
393
+ encoder.block.5.layer.1.DenseReluDense.wi_1
394
+ encoder.block.5.layer.1.DenseReluDense.wo
395
+ encoder.block.6.layer.0.SelfAttention.q
396
+ encoder.block.6.layer.0.SelfAttention.k
397
+ encoder.block.6.layer.0.SelfAttention.v
398
+ encoder.block.6.layer.0.SelfAttention.o
399
+ encoder.block.6.layer.1.DenseReluDense.wi_0
400
+ encoder.block.6.layer.1.DenseReluDense.wi_1
401
+ encoder.block.6.layer.1.DenseReluDense.wo
402
+ encoder.block.7.layer.0.SelfAttention.q
403
+ encoder.block.7.layer.0.SelfAttention.k
404
+ encoder.block.7.layer.0.SelfAttention.v
405
+ encoder.block.7.layer.0.SelfAttention.o
406
+ encoder.block.7.layer.1.DenseReluDense.wi_0
407
+ encoder.block.7.layer.1.DenseReluDense.wi_1
408
+ encoder.block.7.layer.1.DenseReluDense.wo
409
+ encoder.block.8.layer.0.SelfAttention.q
410
+ encoder.block.8.layer.0.SelfAttention.k
411
+ encoder.block.8.layer.0.SelfAttention.v
412
+ encoder.block.8.layer.0.SelfAttention.o
413
+ encoder.block.8.layer.1.DenseReluDense.wi_0
414
+ encoder.block.8.layer.1.DenseReluDense.wi_1
415
+ encoder.block.8.layer.1.DenseReluDense.wo
416
+ encoder.block.9.layer.0.SelfAttention.q
417
+ encoder.block.9.layer.0.SelfAttention.k
418
+ encoder.block.9.layer.0.SelfAttention.v
419
+ encoder.block.9.layer.0.SelfAttention.o
420
+ encoder.block.9.layer.1.DenseReluDense.wi_0
421
+ encoder.block.9.layer.1.DenseReluDense.wi_1
422
+ encoder.block.9.layer.1.DenseReluDense.wo
423
+ encoder.block.10.layer.0.SelfAttention.q
424
+ encoder.block.10.layer.0.SelfAttention.k
425
+ encoder.block.10.layer.0.SelfAttention.v
426
+ encoder.block.10.layer.0.SelfAttention.o
427
+ encoder.block.10.layer.1.DenseReluDense.wi_0
428
+ encoder.block.10.layer.1.DenseReluDense.wi_1
429
+ encoder.block.10.layer.1.DenseReluDense.wo
430
+ encoder.block.11.layer.0.SelfAttention.q
431
+ encoder.block.11.layer.0.SelfAttention.k
432
+ encoder.block.11.layer.0.SelfAttention.v
433
+ encoder.block.11.layer.0.SelfAttention.o
434
+ encoder.block.11.layer.1.DenseReluDense.wi_0
435
+ encoder.block.11.layer.1.DenseReluDense.wi_1
436
+ encoder.block.11.layer.1.DenseReluDense.wo
437
+ Done.
workspace/flan-ts-large.txt ADDED
@@ -0,0 +1,857 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CUDA extension not installed.
2
+ Downloading (��)lve/main/config.json: 100%|����������| 662/662 [00:00<00:00, 1.65MB/s]
3
+ Downloading pytorch_model.bin: 100%|��������������| 3.13G/3.13G [00:36<00:00, 86.9MB/s]
4
+ Some weights of the model checkpoint at google/flan-t5-large were not used when initializing T5EncoderModel: ['decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.embed_tokens.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'lm_head.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.final_layer_norm.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight']
5
+ - This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
6
+ - This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
7
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
8
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
9
+ Downloading (��)okenizer_config.json: 100%|��| 2.54k/2.54k [00:00<00:00, 9.09MB/s]
10
+ Downloading spiece.model: 100%|����������������������������| 792k/792k [00:00<00:00, 28.7MB/s]
11
+ Downloading (��)cial_tokens_map.json: 100%|��| 2.20k/2.20k [00:00<00:00, 7.83MB/s]
12
+ Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
13
+ Starting ...
14
+ Ready.
15
+ 0 layer.0.SelfAttention.q
16
+ Quantizing ...
17
+ time 0.55
18
+ error 142.37025451660156
19
+ 0 layer.0.SelfAttention.k
20
+ Quantizing ...
21
+ time 0.25
22
+ error 9521.5029296875
23
+ 0 layer.0.SelfAttention.v
24
+ Quantizing ...
25
+ time 0.26
26
+ error 2544.900390625
27
+ 0 layer.0.SelfAttention.o
28
+ Quantizing ...
29
+ time 0.28
30
+ error 123186.2578125
31
+ 0 layer.1.DenseReluDense.wi_0
32
+ Quantizing ...
33
+ time 0.25
34
+ error 11158.978515625
35
+ 0 layer.1.DenseReluDense.wi_1
36
+ Quantizing ...
37
+ time 0.25
38
+ error 9518.11328125
39
+ 0 layer.1.DenseReluDense.wo
40
+ Quantizing ...
41
+ time 0.72
42
+ error 3637286.0
43
+ 1 layer.0.SelfAttention.q
44
+ Quantizing ...
45
+ time 0.39
46
+ error 536.7674560546875
47
+ 1 layer.0.SelfAttention.k
48
+ Quantizing ...
49
+ time 0.25
50
+ error 25588.546875
51
+ 1 layer.0.SelfAttention.v
52
+ Quantizing ...
53
+ time 0.25
54
+ error 1919.272216796875
55
+ 1 layer.0.SelfAttention.o
56
+ Quantizing ...
57
+ time 0.25
58
+ error 47080.5625
59
+ 1 layer.1.DenseReluDense.wi_0
60
+ Quantizing ...
61
+ time 0.25
62
+ error 9808.359375
63
+ 1 layer.1.DenseReluDense.wi_1
64
+ Quantizing ...
65
+ time 0.25
66
+ error 6298.18896484375
67
+ 1 layer.1.DenseReluDense.wo
68
+ Quantizing ...
69
+ time 0.71
70
+ error 137391.875
71
+ 2 layer.0.SelfAttention.q
72
+ Quantizing ...
73
+ time 0.41
74
+ error 125.06156921386719
75
+ 2 layer.0.SelfAttention.k
76
+ Quantizing ...
77
+ time 0.25
78
+ error 6493.82568359375
79
+ 2 layer.0.SelfAttention.v
80
+ Quantizing ...
81
+ time 0.25
82
+ error 1306.6259765625
83
+ 2 layer.0.SelfAttention.o
84
+ Quantizing ...
85
+ time 0.25
86
+ error 3543.05029296875
87
+ 2 layer.1.DenseReluDense.wi_0
88
+ Quantizing ...
89
+ time 0.25
90
+ error 10326.599609375
91
+ 2 layer.1.DenseReluDense.wi_1
92
+ Quantizing ...
93
+ time 0.25
94
+ error 8165.3193359375
95
+ 2 layer.1.DenseReluDense.wo
96
+ Quantizing ...
97
+ time 0.69
98
+ error 105276.7265625
99
+ 3 layer.0.SelfAttention.q
100
+ Quantizing ...
101
+ time 0.41
102
+ error 137.07083129882812
103
+ 3 layer.0.SelfAttention.k
104
+ Quantizing ...
105
+ time 0.25
106
+ error 7485.19384765625
107
+ 3 layer.0.SelfAttention.v
108
+ Quantizing ...
109
+ time 0.25
110
+ error 1563.48095703125
111
+ 3 layer.0.SelfAttention.o
112
+ Quantizing ...
113
+ time 0.27
114
+ error 3057.40673828125
115
+ 3 layer.1.DenseReluDense.wi_0
116
+ Quantizing ...
117
+ time 0.25
118
+ error 10634.482421875
119
+ 3 layer.1.DenseReluDense.wi_1
120
+ Quantizing ...
121
+ time 0.27
122
+ error 9444.2841796875
123
+ 3 layer.1.DenseReluDense.wo
124
+ Quantizing ...
125
+ time 0.73
126
+ error 105683.125
127
+ 4 layer.0.SelfAttention.q
128
+ Quantizing ...
129
+ time 0.41
130
+ error 133.7151336669922
131
+ 4 layer.0.SelfAttention.k
132
+ Quantizing ...
133
+ time 0.27
134
+ error 7297.93896484375
135
+ 4 layer.0.SelfAttention.v
136
+ Quantizing ...
137
+ time 0.25
138
+ error 1610.62939453125
139
+ 4 layer.0.SelfAttention.o
140
+ Quantizing ...
141
+ time 0.25
142
+ error 7214.41796875
143
+ 4 layer.1.DenseReluDense.wi_0
144
+ Quantizing ...
145
+ time 0.25
146
+ error 14451.642578125
147
+ 4 layer.1.DenseReluDense.wi_1
148
+ Quantizing ...
149
+ time 0.25
150
+ error 15960.328125
151
+ 4 layer.1.DenseReluDense.wo
152
+ Quantizing ...
153
+ time 0.69
154
+ error 4980679168.0
155
+ 5 layer.0.SelfAttention.q
156
+ Quantizing ...
157
+ time 0.39
158
+ error 140.4214324951172
159
+ 5 layer.0.SelfAttention.k
160
+ Quantizing ...
161
+ time 0.25
162
+ error 7479.8193359375
163
+ 5 layer.0.SelfAttention.v
164
+ Quantizing ...
165
+ time 0.25
166
+ error 2484.518310546875
167
+ 5 layer.0.SelfAttention.o
168
+ Quantizing ...
169
+ time 0.25
170
+ error 8618.46484375
171
+ 5 layer.1.DenseReluDense.wi_0
172
+ Quantizing ...
173
+ time 0.27
174
+ error 10754.0419921875
175
+ 5 layer.1.DenseReluDense.wi_1
176
+ Quantizing ...
177
+ time 0.25
178
+ error 13012.9423828125
179
+ 5 layer.1.DenseReluDense.wo
180
+ Quantizing ...
181
+ time 0.69
182
+ error 107111.1875
183
+ 6 layer.0.SelfAttention.q
184
+ Quantizing ...
185
+ time 0.40
186
+ error 112.6629867553711
187
+ 6 layer.0.SelfAttention.k
188
+ Quantizing ...
189
+ time 0.25
190
+ error 7047.806640625
191
+ 6 layer.0.SelfAttention.v
192
+ Quantizing ...
193
+ time 0.25
194
+ error 2059.9892578125
195
+ 6 layer.0.SelfAttention.o
196
+ Quantizing ...
197
+ time 0.25
198
+ error 5445.0029296875
199
+ 6 layer.1.DenseReluDense.wi_0
200
+ Quantizing ...
201
+ time 0.26
202
+ error 11107.181640625
203
+ 6 layer.1.DenseReluDense.wi_1
204
+ Quantizing ...
205
+ time 0.25
206
+ error 15983.3603515625
207
+ 6 layer.1.DenseReluDense.wo
208
+ Quantizing ...
209
+ time 0.70
210
+ error 685753216.0
211
+ 7 layer.0.SelfAttention.q
212
+ Quantizing ...
213
+ time 0.41
214
+ error 133.351806640625
215
+ 7 layer.0.SelfAttention.k
216
+ Quantizing ...
217
+ time 0.25
218
+ error 8262.615234375
219
+ 7 layer.0.SelfAttention.v
220
+ Quantizing ...
221
+ time 0.26
222
+ error 2878.16943359375
223
+ 7 layer.0.SelfAttention.o
224
+ Quantizing ...
225
+ time 0.27
226
+ error 17972.373046875
227
+ 7 layer.1.DenseReluDense.wi_0
228
+ Quantizing ...
229
+ time 0.25
230
+ error 11895.857421875
231
+ 7 layer.1.DenseReluDense.wi_1
232
+ Quantizing ...
233
+ time 0.25
234
+ error 18337.82421875
235
+ 7 layer.1.DenseReluDense.wo
236
+ Quantizing ...
237
+ time 0.72
238
+ error 25902379008.0
239
+ 8 layer.0.SelfAttention.q
240
+ Quantizing ...
241
+ time 0.39
242
+ error 120.18170928955078
243
+ 8 layer.0.SelfAttention.k
244
+ Quantizing ...
245
+ time 0.25
246
+ error 7699.7255859375
247
+ 8 layer.0.SelfAttention.v
248
+ Quantizing ...
249
+ time 0.25
250
+ error 2972.5712890625
251
+ 8 layer.0.SelfAttention.o
252
+ Quantizing ...
253
+ time 0.25
254
+ error 8750.123046875
255
+ 8 layer.1.DenseReluDense.wi_0
256
+ Quantizing ...
257
+ time 0.25
258
+ error 11126.8662109375
259
+ 8 layer.1.DenseReluDense.wi_1
260
+ Quantizing ...
261
+ time 0.25
262
+ error 18306.9609375
263
+ 8 layer.1.DenseReluDense.wo
264
+ Quantizing ...
265
+ time 0.71
266
+ error 128990.28125
267
+ 9 layer.0.SelfAttention.q
268
+ Quantizing ...
269
+ time 0.39
270
+ error 126.16083526611328
271
+ 9 layer.0.SelfAttention.k
272
+ Quantizing ...
273
+ time 0.25
274
+ error 8584.9208984375
275
+ 9 layer.0.SelfAttention.v
276
+ Quantizing ...
277
+ time 0.26
278
+ error 3245.54541015625
279
+ 9 layer.0.SelfAttention.o
280
+ Quantizing ...
281
+ time 0.25
282
+ error 15868.41015625
283
+ 9 layer.1.DenseReluDense.wi_0
284
+ Quantizing ...
285
+ time 0.25
286
+ error 9290.447265625
287
+ 9 layer.1.DenseReluDense.wi_1
288
+ Quantizing ...
289
+ time 0.25
290
+ error 17894.17578125
291
+ 9 layer.1.DenseReluDense.wo
292
+ Quantizing ...
293
+ time 0.71
294
+ error 149863.296875
295
+ 10 layer.0.SelfAttention.q
296
+ Quantizing ...
297
+ time 0.39
298
+ error 107.48172760009766
299
+ 10 layer.0.SelfAttention.k
300
+ Quantizing ...
301
+ time 0.27
302
+ error 6898.35595703125
303
+ 10 layer.0.SelfAttention.v
304
+ Quantizing ...
305
+ time 0.26
306
+ error 3770.64990234375
307
+ 10 layer.0.SelfAttention.o
308
+ Quantizing ...
309
+ time 0.25
310
+ error 17137.037109375
311
+ 10 layer.1.DenseReluDense.wi_0
312
+ Quantizing ...
313
+ time 0.27
314
+ error 8128.5166015625
315
+ 10 layer.1.DenseReluDense.wi_1
316
+ Quantizing ...
317
+ time 0.26
318
+ error 17371.587890625
319
+ 10 layer.1.DenseReluDense.wo
320
+ Quantizing ...
321
+ time 0.73
322
+ error 116027.1015625
323
+ 11 layer.0.SelfAttention.q
324
+ Quantizing ...
325
+ time 0.40
326
+ error 104.61625671386719
327
+ 11 layer.0.SelfAttention.k
328
+ Quantizing ...
329
+ time 0.25
330
+ error 7259.4208984375
331
+ 11 layer.0.SelfAttention.v
332
+ Quantizing ...
333
+ time 0.25
334
+ error 5005.52490234375
335
+ 11 layer.0.SelfAttention.o
336
+ Quantizing ...
337
+ time 0.27
338
+ error 32728.1015625
339
+ 11 layer.1.DenseReluDense.wi_0
340
+ Quantizing ...
341
+ time 0.25
342
+ error 8535.056640625
343
+ 11 layer.1.DenseReluDense.wi_1
344
+ Quantizing ...
345
+ time 0.27
346
+ error 22538.978515625
347
+ 11 layer.1.DenseReluDense.wo
348
+ Quantizing ...
349
+ time 0.71
350
+ error 170254.40625
351
+ 12 layer.0.SelfAttention.q
352
+ Quantizing ...
353
+ time 0.39
354
+ error 94.82140350341797
355
+ 12 layer.0.SelfAttention.k
356
+ Quantizing ...
357
+ time 0.25
358
+ error 6448.5205078125
359
+ 12 layer.0.SelfAttention.v
360
+ Quantizing ...
361
+ time 0.25
362
+ error 5083.41796875
363
+ 12 layer.0.SelfAttention.o
364
+ Quantizing ...
365
+ time 0.25
366
+ error 60036.953125
367
+ 12 layer.1.DenseReluDense.wi_0
368
+ Quantizing ...
369
+ time 0.26
370
+ error 7829.4384765625
371
+ 12 layer.1.DenseReluDense.wi_1
372
+ Quantizing ...
373
+ time 0.26
374
+ error 23411.65234375
375
+ 12 layer.1.DenseReluDense.wo
376
+ Quantizing ...
377
+ time 0.69
378
+ error 231657.15625
379
+ 13 layer.0.SelfAttention.q
380
+ Quantizing ...
381
+ time 0.39
382
+ error 90.77069091796875
383
+ 13 layer.0.SelfAttention.k
384
+ Quantizing ...
385
+ time 0.25
386
+ error 5828.037109375
387
+ 13 layer.0.SelfAttention.v
388
+ Quantizing ...
389
+ time 0.26
390
+ error 4888.35302734375
391
+ 13 layer.0.SelfAttention.o
392
+ Quantizing ...
393
+ time 0.25
394
+ error 41515.46484375
395
+ 13 layer.1.DenseReluDense.wi_0
396
+ Quantizing ...
397
+ time 0.25
398
+ error 7063.1728515625
399
+ 13 layer.1.DenseReluDense.wi_1
400
+ Quantizing ...
401
+ time 0.25
402
+ error 23648.7421875
403
+ 13 layer.1.DenseReluDense.wo
404
+ Quantizing ...
405
+ time 0.69
406
+ error 261193.75
407
+ 14 layer.0.SelfAttention.q
408
+ Quantizing ...
409
+ time 0.39
410
+ error 77.24964904785156
411
+ 14 layer.0.SelfAttention.k
412
+ Quantizing ...
413
+ time 0.27
414
+ error 5096.2626953125
415
+ 14 layer.0.SelfAttention.v
416
+ Quantizing ...
417
+ time 0.26
418
+ error 6915.9384765625
419
+ 14 layer.0.SelfAttention.o
420
+ Quantizing ...
421
+ time 0.26
422
+ error 56402.62890625
423
+ 14 layer.1.DenseReluDense.wi_0
424
+ Quantizing ...
425
+ time 0.28
426
+ error 6039.11328125
427
+ 14 layer.1.DenseReluDense.wi_1
428
+ Quantizing ...
429
+ time 0.25
430
+ error 24090.625
431
+ 14 layer.1.DenseReluDense.wo
432
+ Quantizing ...
433
+ time 0.71
434
+ error 355204.3125
435
+ 15 layer.0.SelfAttention.q
436
+ Quantizing ...
437
+ time 0.39
438
+ error 72.92942810058594
439
+ 15 layer.0.SelfAttention.k
440
+ Quantizing ...
441
+ time 0.25
442
+ error 5561.1201171875
443
+ 15 layer.0.SelfAttention.v
444
+ Quantizing ...
445
+ time 0.25
446
+ error 8621.376953125
447
+ 15 layer.0.SelfAttention.o
448
+ Quantizing ...
449
+ time 0.25
450
+ error 146386.5625
451
+ 15 layer.1.DenseReluDense.wi_0
452
+ Quantizing ...
453
+ time 0.25
454
+ error 5684.064453125
455
+ 15 layer.1.DenseReluDense.wi_1
456
+ Quantizing ...
457
+ time 0.25
458
+ error 26869.12109375
459
+ 15 layer.1.DenseReluDense.wo
460
+ Quantizing ...
461
+ time 0.70
462
+ error 361036.25
463
+ 16 layer.0.SelfAttention.q
464
+ Quantizing ...
465
+ time 0.39
466
+ error 75.83228302001953
467
+ 16 layer.0.SelfAttention.k
468
+ Quantizing ...
469
+ time 0.25
470
+ error 5176.50341796875
471
+ 16 layer.0.SelfAttention.v
472
+ Quantizing ...
473
+ time 0.25
474
+ error 9754.8203125
475
+ 16 layer.0.SelfAttention.o
476
+ Quantizing ...
477
+ time 0.25
478
+ error 231755.03125
479
+ 16 layer.1.DenseReluDense.wi_0
480
+ Quantizing ...
481
+ time 0.27
482
+ error 5699.75390625
483
+ 16 layer.1.DenseReluDense.wi_1
484
+ Quantizing ...
485
+ time 0.25
486
+ error 25039.771484375
487
+ 16 layer.1.DenseReluDense.wo
488
+ Quantizing ...
489
+ time 0.69
490
+ error 651520.75
491
+ 17 layer.0.SelfAttention.q
492
+ Quantizing ...
493
+ time 0.39
494
+ error 61.858299255371094
495
+ 17 layer.0.SelfAttention.k
496
+ Quantizing ...
497
+ time 0.25
498
+ error 4369.08251953125
499
+ 17 layer.0.SelfAttention.v
500
+ Quantizing ...
501
+ time 0.25
502
+ error 12425.16796875
503
+ 17 layer.0.SelfAttention.o
504
+ Quantizing ...
505
+ time 0.25
506
+ error 408129.875
507
+ 17 layer.1.DenseReluDense.wi_0
508
+ Quantizing ...
509
+ time 0.25
510
+ error 5317.8798828125
511
+ 17 layer.1.DenseReluDense.wi_1
512
+ Quantizing ...
513
+ time 0.25
514
+ error 26979.31640625
515
+ 17 layer.1.DenseReluDense.wo
516
+ Quantizing ...
517
+ time 0.73
518
+ error 689154.875
519
+ 18 layer.0.SelfAttention.q
520
+ Quantizing ...
521
+ time 0.41
522
+ error 68.12550354003906
523
+ 18 layer.0.SelfAttention.k
524
+ Quantizing ...
525
+ time 0.27
526
+ error 4010.4833984375
527
+ 18 layer.0.SelfAttention.v
528
+ Quantizing ...
529
+ time 0.26
530
+ error 14657.2314453125
531
+ 18 layer.0.SelfAttention.o
532
+ Quantizing ...
533
+ time 0.25
534
+ error 206627.5
535
+ 18 layer.1.DenseReluDense.wi_0
536
+ Quantizing ...
537
+ time 0.28
538
+ error 6068.525390625
539
+ 18 layer.1.DenseReluDense.wi_1
540
+ Quantizing ...
541
+ time 0.25
542
+ error 28093.669921875
543
+ 18 layer.1.DenseReluDense.wo
544
+ Quantizing ...
545
+ time 0.72
546
+ error 1019951.8125
547
+ 19 layer.0.SelfAttention.q
548
+ Quantizing ...
549
+ time 0.41
550
+ error 57.68662643432617
551
+ 19 layer.0.SelfAttention.k
552
+ Quantizing ...
553
+ time 0.25
554
+ error 4086.83349609375
555
+ 19 layer.0.SelfAttention.v
556
+ Quantizing ...
557
+ time 0.25
558
+ error 14453.2578125
559
+ 19 layer.0.SelfAttention.o
560
+ Quantizing ...
561
+ time 0.25
562
+ error 460674.0
563
+ 19 layer.1.DenseReluDense.wi_0
564
+ Quantizing ...
565
+ time 0.25
566
+ error 5235.9794921875
567
+ 19 layer.1.DenseReluDense.wi_1
568
+ Quantizing ...
569
+ time 0.26
570
+ error 28788.4765625
571
+ 19 layer.1.DenseReluDense.wo
572
+ Quantizing ...
573
+ time 0.70
574
+ error 1332541.0
575
+ 20 layer.0.SelfAttention.q
576
+ Quantizing ...
577
+ time 0.39
578
+ error 42.9056510925293
579
+ 20 layer.0.SelfAttention.k
580
+ Quantizing ...
581
+ time 0.25
582
+ error 2894.2177734375
583
+ 20 layer.0.SelfAttention.v
584
+ Quantizing ...
585
+ time 0.25
586
+ error 16684.044921875
587
+ 20 layer.0.SelfAttention.o
588
+ Quantizing ...
589
+ time 0.25
590
+ error 557086.6875
591
+ 20 layer.1.DenseReluDense.wi_0
592
+ Quantizing ...
593
+ time 0.25
594
+ error 6791.15625
595
+ 20 layer.1.DenseReluDense.wi_1
596
+ Quantizing ...
597
+ time 0.25
598
+ error 38994.37890625
599
+ 20 layer.1.DenseReluDense.wo
600
+ Quantizing ...
601
+ time 0.69
602
+ error 2295082.0
603
+ 21 layer.0.SelfAttention.q
604
+ Quantizing ...
605
+ time 0.41
606
+ error 58.024559020996094
607
+ 21 layer.0.SelfAttention.k
608
+ Quantizing ...
609
+ time 0.25
610
+ error 3534.38427734375
611
+ 21 layer.0.SelfAttention.v
612
+ Quantizing ...
613
+ time 0.28
614
+ error 23622.609375
615
+ 21 layer.0.SelfAttention.o
616
+ Quantizing ...
617
+ time 0.26
618
+ error 630538.75
619
+ 21 layer.1.DenseReluDense.wi_0
620
+ Quantizing ...
621
+ time 0.27
622
+ error 6944.4306640625
623
+ 21 layer.1.DenseReluDense.wi_1
624
+ Quantizing ...
625
+ time 0.25
626
+ error 41437.5546875
627
+ 21 layer.1.DenseReluDense.wo
628
+ Quantizing ...
629
+ time 0.72
630
+ error 2805766.25
631
+ 22 layer.0.SelfAttention.q
632
+ Quantizing ...
633
+ time 0.39
634
+ error 56.98418426513672
635
+ 22 layer.0.SelfAttention.k
636
+ Quantizing ...
637
+ time 0.27
638
+ error 2588.40576171875
639
+ 22 layer.0.SelfAttention.v
640
+ Quantizing ...
641
+ time 0.26
642
+ error 33727.3125
643
+ 22 layer.0.SelfAttention.o
644
+ Quantizing ...
645
+ time 0.26
646
+ error 1536184.5
647
+ 22 layer.1.DenseReluDense.wi_0
648
+ Quantizing ...
649
+ time 0.28
650
+ error 7638.18701171875
651
+ 22 layer.1.DenseReluDense.wi_1
652
+ Quantizing ...
653
+ time 0.25
654
+ error 49872.0859375
655
+ 22 layer.1.DenseReluDense.wo
656
+ Quantizing ...
657
+ time 0.69
658
+ error 4077312.5
659
+ 23 layer.0.SelfAttention.q
660
+ Quantizing ...
661
+ time 0.40
662
+ error 53.174556732177734
663
+ 23 layer.0.SelfAttention.k
664
+ Quantizing ...
665
+ time 0.26
666
+ error 2663.560302734375
667
+ 23 layer.0.SelfAttention.v
668
+ Quantizing ...
669
+ time 0.27
670
+ error 35553.75
671
+ 23 layer.0.SelfAttention.o
672
+ Quantizing ...
673
+ time 0.26
674
+ error 1983365.75
675
+ 23 layer.1.DenseReluDense.wi_0
676
+ Quantizing ...
677
+ time 0.25
678
+ error 8208.654296875
679
+ 23 layer.1.DenseReluDense.wi_1
680
+ Quantizing ...
681
+ time 0.25
682
+ error 51633.640625
683
+ 23 layer.1.DenseReluDense.wo
684
+ Quantizing ...
685
+ time 0.69
686
+ error 8843078.0
687
+ 114.8298749923706
688
+ Packing ...
689
+ encoder.block.0.layer.0.SelfAttention.q
690
+ encoder.block.0.layer.0.SelfAttention.k
691
+ encoder.block.0.layer.0.SelfAttention.v
692
+ encoder.block.0.layer.0.SelfAttention.o
693
+ encoder.block.0.layer.1.DenseReluDense.wi_0
694
+ encoder.block.0.layer.1.DenseReluDense.wi_1
695
+ encoder.block.0.layer.1.DenseReluDense.wo
696
+ encoder.block.1.layer.0.SelfAttention.q
697
+ encoder.block.1.layer.0.SelfAttention.k
698
+ encoder.block.1.layer.0.SelfAttention.v
699
+ encoder.block.1.layer.0.SelfAttention.o
700
+ encoder.block.1.layer.1.DenseReluDense.wi_0
701
+ encoder.block.1.layer.1.DenseReluDense.wi_1
702
+ encoder.block.1.layer.1.DenseReluDense.wo
703
+ encoder.block.2.layer.0.SelfAttention.q
704
+ encoder.block.2.layer.0.SelfAttention.k
705
+ encoder.block.2.layer.0.SelfAttention.v
706
+ encoder.block.2.layer.0.SelfAttention.o
707
+ encoder.block.2.layer.1.DenseReluDense.wi_0
708
+ encoder.block.2.layer.1.DenseReluDense.wi_1
709
+ encoder.block.2.layer.1.DenseReluDense.wo
710
+ encoder.block.3.layer.0.SelfAttention.q
711
+ encoder.block.3.layer.0.SelfAttention.k
712
+ encoder.block.3.layer.0.SelfAttention.v
713
+ encoder.block.3.layer.0.SelfAttention.o
714
+ encoder.block.3.layer.1.DenseReluDense.wi_0
715
+ encoder.block.3.layer.1.DenseReluDense.wi_1
716
+ encoder.block.3.layer.1.DenseReluDense.wo
717
+ encoder.block.4.layer.0.SelfAttention.q
718
+ encoder.block.4.layer.0.SelfAttention.k
719
+ encoder.block.4.layer.0.SelfAttention.v
720
+ encoder.block.4.layer.0.SelfAttention.o
721
+ encoder.block.4.layer.1.DenseReluDense.wi_0
722
+ encoder.block.4.layer.1.DenseReluDense.wi_1
723
+ encoder.block.4.layer.1.DenseReluDense.wo
724
+ encoder.block.5.layer.0.SelfAttention.q
725
+ encoder.block.5.layer.0.SelfAttention.k
726
+ encoder.block.5.layer.0.SelfAttention.v
727
+ encoder.block.5.layer.0.SelfAttention.o
728
+ encoder.block.5.layer.1.DenseReluDense.wi_0
729
+ encoder.block.5.layer.1.DenseReluDense.wi_1
730
+ encoder.block.5.layer.1.DenseReluDense.wo
731
+ encoder.block.6.layer.0.SelfAttention.q
732
+ encoder.block.6.layer.0.SelfAttention.k
733
+ encoder.block.6.layer.0.SelfAttention.v
734
+ encoder.block.6.layer.0.SelfAttention.o
735
+ encoder.block.6.layer.1.DenseReluDense.wi_0
736
+ encoder.block.6.layer.1.DenseReluDense.wi_1
737
+ encoder.block.6.layer.1.DenseReluDense.wo
738
+ encoder.block.7.layer.0.SelfAttention.q
739
+ encoder.block.7.layer.0.SelfAttention.k
740
+ encoder.block.7.layer.0.SelfAttention.v
741
+ encoder.block.7.layer.0.SelfAttention.o
742
+ encoder.block.7.layer.1.DenseReluDense.wi_0
743
+ encoder.block.7.layer.1.DenseReluDense.wi_1
744
+ encoder.block.7.layer.1.DenseReluDense.wo
745
+ encoder.block.8.layer.0.SelfAttention.q
746
+ encoder.block.8.layer.0.SelfAttention.k
747
+ encoder.block.8.layer.0.SelfAttention.v
748
+ encoder.block.8.layer.0.SelfAttention.o
749
+ encoder.block.8.layer.1.DenseReluDense.wi_0
750
+ encoder.block.8.layer.1.DenseReluDense.wi_1
751
+ encoder.block.8.layer.1.DenseReluDense.wo
752
+ encoder.block.9.layer.0.SelfAttention.q
753
+ encoder.block.9.layer.0.SelfAttention.k
754
+ encoder.block.9.layer.0.SelfAttention.v
755
+ encoder.block.9.layer.0.SelfAttention.o
756
+ encoder.block.9.layer.1.DenseReluDense.wi_0
757
+ encoder.block.9.layer.1.DenseReluDense.wi_1
758
+ encoder.block.9.layer.1.DenseReluDense.wo
759
+ encoder.block.10.layer.0.SelfAttention.q
760
+ encoder.block.10.layer.0.SelfAttention.k
761
+ encoder.block.10.layer.0.SelfAttention.v
762
+ encoder.block.10.layer.0.SelfAttention.o
763
+ encoder.block.10.layer.1.DenseReluDense.wi_0
764
+ encoder.block.10.layer.1.DenseReluDense.wi_1
765
+ encoder.block.10.layer.1.DenseReluDense.wo
766
+ encoder.block.11.layer.0.SelfAttention.q
767
+ encoder.block.11.layer.0.SelfAttention.k
768
+ encoder.block.11.layer.0.SelfAttention.v
769
+ encoder.block.11.layer.0.SelfAttention.o
770
+ encoder.block.11.layer.1.DenseReluDense.wi_0
771
+ encoder.block.11.layer.1.DenseReluDense.wi_1
772
+ encoder.block.11.layer.1.DenseReluDense.wo
773
+ encoder.block.12.layer.0.SelfAttention.q
774
+ encoder.block.12.layer.0.SelfAttention.k
775
+ encoder.block.12.layer.0.SelfAttention.v
776
+ encoder.block.12.layer.0.SelfAttention.o
777
+ encoder.block.12.layer.1.DenseReluDense.wi_0
778
+ encoder.block.12.layer.1.DenseReluDense.wi_1
779
+ encoder.block.12.layer.1.DenseReluDense.wo
780
+ encoder.block.13.layer.0.SelfAttention.q
781
+ encoder.block.13.layer.0.SelfAttention.k
782
+ encoder.block.13.layer.0.SelfAttention.v
783
+ encoder.block.13.layer.0.SelfAttention.o
784
+ encoder.block.13.layer.1.DenseReluDense.wi_0
785
+ encoder.block.13.layer.1.DenseReluDense.wi_1
786
+ encoder.block.13.layer.1.DenseReluDense.wo
787
+ encoder.block.14.layer.0.SelfAttention.q
788
+ encoder.block.14.layer.0.SelfAttention.k
789
+ encoder.block.14.layer.0.SelfAttention.v
790
+ encoder.block.14.layer.0.SelfAttention.o
791
+ encoder.block.14.layer.1.DenseReluDense.wi_0
792
+ encoder.block.14.layer.1.DenseReluDense.wi_1
793
+ encoder.block.14.layer.1.DenseReluDense.wo
794
+ encoder.block.15.layer.0.SelfAttention.q
795
+ encoder.block.15.layer.0.SelfAttention.k
796
+ encoder.block.15.layer.0.SelfAttention.v
797
+ encoder.block.15.layer.0.SelfAttention.o
798
+ encoder.block.15.layer.1.DenseReluDense.wi_0
799
+ encoder.block.15.layer.1.DenseReluDense.wi_1
800
+ encoder.block.15.layer.1.DenseReluDense.wo
801
+ encoder.block.16.layer.0.SelfAttention.q
802
+ encoder.block.16.layer.0.SelfAttention.k
803
+ encoder.block.16.layer.0.SelfAttention.v
804
+ encoder.block.16.layer.0.SelfAttention.o
805
+ encoder.block.16.layer.1.DenseReluDense.wi_0
806
+ encoder.block.16.layer.1.DenseReluDense.wi_1
807
+ encoder.block.16.layer.1.DenseReluDense.wo
808
+ encoder.block.17.layer.0.SelfAttention.q
809
+ encoder.block.17.layer.0.SelfAttention.k
810
+ encoder.block.17.layer.0.SelfAttention.v
811
+ encoder.block.17.layer.0.SelfAttention.o
812
+ encoder.block.17.layer.1.DenseReluDense.wi_0
813
+ encoder.block.17.layer.1.DenseReluDense.wi_1
814
+ encoder.block.17.layer.1.DenseReluDense.wo
815
+ encoder.block.18.layer.0.SelfAttention.q
816
+ encoder.block.18.layer.0.SelfAttention.k
817
+ encoder.block.18.layer.0.SelfAttention.v
818
+ encoder.block.18.layer.0.SelfAttention.o
819
+ encoder.block.18.layer.1.DenseReluDense.wi_0
820
+ encoder.block.18.layer.1.DenseReluDense.wi_1
821
+ encoder.block.18.layer.1.DenseReluDense.wo
822
+ encoder.block.19.layer.0.SelfAttention.q
823
+ encoder.block.19.layer.0.SelfAttention.k
824
+ encoder.block.19.layer.0.SelfAttention.v
825
+ encoder.block.19.layer.0.SelfAttention.o
826
+ encoder.block.19.layer.1.DenseReluDense.wi_0
827
+ encoder.block.19.layer.1.DenseReluDense.wi_1
828
+ encoder.block.19.layer.1.DenseReluDense.wo
829
+ encoder.block.20.layer.0.SelfAttention.q
830
+ encoder.block.20.layer.0.SelfAttention.k
831
+ encoder.block.20.layer.0.SelfAttention.v
832
+ encoder.block.20.layer.0.SelfAttention.o
833
+ encoder.block.20.layer.1.DenseReluDense.wi_0
834
+ encoder.block.20.layer.1.DenseReluDense.wi_1
835
+ encoder.block.20.layer.1.DenseReluDense.wo
836
+ encoder.block.21.layer.0.SelfAttention.q
837
+ encoder.block.21.layer.0.SelfAttention.k
838
+ encoder.block.21.layer.0.SelfAttention.v
839
+ encoder.block.21.layer.0.SelfAttention.o
840
+ encoder.block.21.layer.1.DenseReluDense.wi_0
841
+ encoder.block.21.layer.1.DenseReluDense.wi_1
842
+ encoder.block.21.layer.1.DenseReluDense.wo
843
+ encoder.block.22.layer.0.SelfAttention.q
844
+ encoder.block.22.layer.0.SelfAttention.k
845
+ encoder.block.22.layer.0.SelfAttention.v
846
+ encoder.block.22.layer.0.SelfAttention.o
847
+ encoder.block.22.layer.1.DenseReluDense.wi_0
848
+ encoder.block.22.layer.1.DenseReluDense.wi_1
849
+ encoder.block.22.layer.1.DenseReluDense.wo
850
+ encoder.block.23.layer.0.SelfAttention.q
851
+ encoder.block.23.layer.0.SelfAttention.k
852
+ encoder.block.23.layer.0.SelfAttention.v
853
+ encoder.block.23.layer.0.SelfAttention.o
854
+ encoder.block.23.layer.1.DenseReluDense.wi_0
855
+ encoder.block.23.layer.1.DenseReluDense.wi_1
856
+ encoder.block.23.layer.1.DenseReluDense.wo
857
+ Done.
workspace/flan-ts-small.txt ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CUDA extension not installed.
2
+ Downloading (��)lve/main/config.json: 100%|��| 1.40k/1.40k [00:00<00:00, 3.49MB/s]
3
+ Downloading pytorch_model.bin: 100%|������������������| 308M/308M [00:03<00:00, 88.9MB/s]
4
+ Some weights of the model checkpoint at google/flan-t5-small were not used when initializing T5EncoderModel: ['decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.embed_tokens.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.final_layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'lm_head.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight']
5
+ - This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
6
+ - This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
7
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
8
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
9
+ Downloading (��)okenizer_config.json: 100%|��| 2.54k/2.54k [00:00<00:00, 8.90MB/s]
10
+ Downloading spiece.model: 100%|������������������������������| 792k/792k [00:00<00:00, 122MB/s]
11
+ Downloading (��)cial_tokens_map.json: 100%|��| 2.20k/2.20k [00:00<00:00, 8.32MB/s]
12
+ Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
13
+ Starting ...
14
+ Ready.
15
+ 0 layer.0.SelfAttention.q
16
+ Quantizing ...
17
+ time 0.70
18
+ error 84.84163665771484
19
+ 0 layer.0.SelfAttention.k
20
+ Quantizing ...
21
+ time 0.13
22
+ error 4683.52685546875
23
+ 0 layer.0.SelfAttention.v
24
+ Quantizing ...
25
+ time 0.13
26
+ error 1787.6051025390625
27
+ 0 layer.0.SelfAttention.o
28
+ Quantizing ...
29
+ time 0.10
30
+ error 137924.640625
31
+ 0 layer.1.DenseReluDense.wi_0
32
+ Quantizing ...
33
+ time 0.13
34
+ error 9668.408203125
35
+ 0 layer.1.DenseReluDense.wi_1
36
+ Quantizing ...
37
+ time 0.13
38
+ error 12095.4453125
39
+ 0 layer.1.DenseReluDense.wo
40
+ Quantizing ...
41
+ time 0.26
42
+ error 592524.75
43
+ 1 layer.0.SelfAttention.q
44
+ Quantizing ...
45
+ time 0.17
46
+ error 76.55366516113281
47
+ 1 layer.0.SelfAttention.k
48
+ Quantizing ...
49
+ time 0.13
50
+ error 4797.10107421875
51
+ 1 layer.0.SelfAttention.v
52
+ Quantizing ...
53
+ time 0.13
54
+ error 3586.5419921875
55
+ 1 layer.0.SelfAttention.o
56
+ Quantizing ...
57
+ time 0.09
58
+ error 30098.28515625
59
+ 1 layer.1.DenseReluDense.wi_0
60
+ Quantizing ...
61
+ time 0.13
62
+ error 7313.28759765625
63
+ 1 layer.1.DenseReluDense.wi_1
64
+ Quantizing ...
65
+ time 0.13
66
+ error 11631.021484375
67
+ 1 layer.1.DenseReluDense.wo
68
+ Quantizing ...
69
+ time 0.25
70
+ error 3476349.0
71
+ 2 layer.0.SelfAttention.q
72
+ Quantizing ...
73
+ time 0.17
74
+ error 41.201637268066406
75
+ 2 layer.0.SelfAttention.k
76
+ Quantizing ...
77
+ time 0.13
78
+ error 2614.22265625
79
+ 2 layer.0.SelfAttention.v
80
+ Quantizing ...
81
+ time 0.13
82
+ error 4339.2080078125
83
+ 2 layer.0.SelfAttention.o
84
+ Quantizing ...
85
+ time 0.10
86
+ error 42485.1328125
87
+ 2 layer.1.DenseReluDense.wi_0
88
+ Quantizing ...
89
+ time 0.13
90
+ error 5012.45947265625
91
+ 2 layer.1.DenseReluDense.wi_1
92
+ Quantizing ...
93
+ time 0.13
94
+ error 16528.76953125
95
+ 2 layer.1.DenseReluDense.wo
96
+ Quantizing ...
97
+ time 0.26
98
+ error 192300448.0
99
+ 3 layer.0.SelfAttention.q
100
+ Quantizing ...
101
+ time 0.17
102
+ error 53.63971710205078
103
+ 3 layer.0.SelfAttention.k
104
+ Quantizing ...
105
+ time 0.13
106
+ error 3402.79736328125
107
+ 3 layer.0.SelfAttention.v
108
+ Quantizing ...
109
+ time 0.13
110
+ error 8263.0869140625
111
+ 3 layer.0.SelfAttention.o
112
+ Quantizing ...
113
+ time 0.10
114
+ error 111050.171875
115
+ 3 layer.1.DenseReluDense.wi_0
116
+ Quantizing ...
117
+ time 0.13
118
+ error 3236.92529296875
119
+ 3 layer.1.DenseReluDense.wi_1
120
+ Quantizing ...
121
+ time 0.13
122
+ error 17445.189453125
123
+ 3 layer.1.DenseReluDense.wo
124
+ Quantizing ...
125
+ time 0.25
126
+ error 700423.0
127
+ 4 layer.0.SelfAttention.q
128
+ Quantizing ...
129
+ time 0.17
130
+ error 38.29411315917969
131
+ 4 layer.0.SelfAttention.k
132
+ Quantizing ...
133
+ time 0.13
134
+ error 2450.30517578125
135
+ 4 layer.0.SelfAttention.v
136
+ Quantizing ...
137
+ time 0.13
138
+ error 11326.40625
139
+ 4 layer.0.SelfAttention.o
140
+ Quantizing ...
141
+ time 0.10
142
+ error 64683.59375
143
+ 4 layer.1.DenseReluDense.wi_0
144
+ Quantizing ...
145
+ time 0.13
146
+ error 2528.781494140625
147
+ 4 layer.1.DenseReluDense.wi_1
148
+ Quantizing ...
149
+ time 0.13
150
+ error 18873.064453125
151
+ 4 layer.1.DenseReluDense.wo
152
+ Quantizing ...
153
+ time 0.25
154
+ error 760352.25
155
+ 5 layer.0.SelfAttention.q
156
+ Quantizing ...
157
+ time 0.17
158
+ error 37.40803527832031
159
+ 5 layer.0.SelfAttention.k
160
+ Quantizing ...
161
+ time 0.13
162
+ error 2389.12841796875
163
+ 5 layer.0.SelfAttention.v
164
+ Quantizing ...
165
+ time 0.13
166
+ error 10107.05078125
167
+ 5 layer.0.SelfAttention.o
168
+ Quantizing ...
169
+ time 0.10
170
+ error 216297.78125
171
+ 5 layer.1.DenseReluDense.wi_0
172
+ Quantizing ...
173
+ time 0.13
174
+ error 2324.2021484375
175
+ 5 layer.1.DenseReluDense.wi_1
176
+ Quantizing ...
177
+ time 0.13
178
+ error 23206.798828125
179
+ 5 layer.1.DenseReluDense.wo
180
+ Quantizing ...
181
+ time 0.26
182
+ error 960373.375
183
+ 6 layer.0.SelfAttention.q
184
+ Quantizing ...
185
+ time 0.18
186
+ error 27.358470916748047
187
+ 6 layer.0.SelfAttention.k
188
+ Quantizing ...
189
+ time 0.13
190
+ error 1652.122802734375
191
+ 6 layer.0.SelfAttention.v
192
+ Quantizing ...
193
+ time 0.13
194
+ error 11492.5712890625
195
+ 6 layer.0.SelfAttention.o
196
+ Quantizing ...
197
+ time 0.10
198
+ error 327756.75
199
+ 6 layer.1.DenseReluDense.wi_0
200
+ Quantizing ...
201
+ time 0.13
202
+ error 2362.47998046875
203
+ 6 layer.1.DenseReluDense.wi_1
204
+ Quantizing ...
205
+ time 0.15
206
+ error 33793.09765625
207
+ 6 layer.1.DenseReluDense.wo
208
+ Quantizing ...
209
+ time 0.27
210
+ error 7250225.0
211
+ 7 layer.0.SelfAttention.q
212
+ Quantizing ...
213
+ time 0.18
214
+ error 31.67843246459961
215
+ 7 layer.0.SelfAttention.k
216
+ Quantizing ...
217
+ time 0.13
218
+ error 1604.3997802734375
219
+ 7 layer.0.SelfAttention.v
220
+ Quantizing ...
221
+ time 0.13
222
+ error 19231.8984375
223
+ 7 layer.0.SelfAttention.o
224
+ Quantizing ...
225
+ time 0.10
226
+ error 493063.46875
227
+ 7 layer.1.DenseReluDense.wi_0
228
+ Quantizing ...
229
+ time 0.14
230
+ error 2606.19873046875
231
+ 7 layer.1.DenseReluDense.wi_1
232
+ Quantizing ...
233
+ time 0.14
234
+ error 55759.1640625
235
+ 7 layer.1.DenseReluDense.wo
236
+ Quantizing ...
237
+ time 0.26
238
+ error 39936240.0
239
+ 16.350690126419067
240
+ Packing ...
241
+ encoder.block.0.layer.0.SelfAttention.q
242
+ encoder.block.0.layer.0.SelfAttention.k
243
+ encoder.block.0.layer.0.SelfAttention.v
244
+ encoder.block.0.layer.0.SelfAttention.o
245
+ encoder.block.0.layer.1.DenseReluDense.wi_0
246
+ encoder.block.0.layer.1.DenseReluDense.wi_1
247
+ encoder.block.0.layer.1.DenseReluDense.wo
248
+ encoder.block.1.layer.0.SelfAttention.q
249
+ encoder.block.1.layer.0.SelfAttention.k
250
+ encoder.block.1.layer.0.SelfAttention.v
251
+ encoder.block.1.layer.0.SelfAttention.o
252
+ encoder.block.1.layer.1.DenseReluDense.wi_0
253
+ encoder.block.1.layer.1.DenseReluDense.wi_1
254
+ encoder.block.1.layer.1.DenseReluDense.wo
255
+ encoder.block.2.layer.0.SelfAttention.q
256
+ encoder.block.2.layer.0.SelfAttention.k
257
+ encoder.block.2.layer.0.SelfAttention.v
258
+ encoder.block.2.layer.0.SelfAttention.o
259
+ encoder.block.2.layer.1.DenseReluDense.wi_0
260
+ encoder.block.2.layer.1.DenseReluDense.wi_1
261
+ encoder.block.2.layer.1.DenseReluDense.wo
262
+ encoder.block.3.layer.0.SelfAttention.q
263
+ encoder.block.3.layer.0.SelfAttention.k
264
+ encoder.block.3.layer.0.SelfAttention.v
265
+ encoder.block.3.layer.0.SelfAttention.o
266
+ encoder.block.3.layer.1.DenseReluDense.wi_0
267
+ encoder.block.3.layer.1.DenseReluDense.wi_1
268
+ encoder.block.3.layer.1.DenseReluDense.wo
269
+ encoder.block.4.layer.0.SelfAttention.q
270
+ encoder.block.4.layer.0.SelfAttention.k
271
+ encoder.block.4.layer.0.SelfAttention.v
272
+ encoder.block.4.layer.0.SelfAttention.o
273
+ encoder.block.4.layer.1.DenseReluDense.wi_0
274
+ encoder.block.4.layer.1.DenseReluDense.wi_1
275
+ encoder.block.4.layer.1.DenseReluDense.wo
276
+ encoder.block.5.layer.0.SelfAttention.q
277
+ encoder.block.5.layer.0.SelfAttention.k
278
+ encoder.block.5.layer.0.SelfAttention.v
279
+ encoder.block.5.layer.0.SelfAttention.o
280
+ encoder.block.5.layer.1.DenseReluDense.wi_0
281
+ encoder.block.5.layer.1.DenseReluDense.wi_1
282
+ encoder.block.5.layer.1.DenseReluDense.wo
283
+ encoder.block.6.layer.0.SelfAttention.q
284
+ encoder.block.6.layer.0.SelfAttention.k
285
+ encoder.block.6.layer.0.SelfAttention.v
286
+ encoder.block.6.layer.0.SelfAttention.o
287
+ encoder.block.6.layer.1.DenseReluDense.wi_0
288
+ encoder.block.6.layer.1.DenseReluDense.wi_1
289
+ encoder.block.6.layer.1.DenseReluDense.wo
290
+ encoder.block.7.layer.0.SelfAttention.q
291
+ encoder.block.7.layer.0.SelfAttention.k
292
+ encoder.block.7.layer.0.SelfAttention.v
293
+ encoder.block.7.layer.0.SelfAttention.o
294
+ encoder.block.7.layer.1.DenseReluDense.wi_0
295
+ encoder.block.7.layer.1.DenseReluDense.wi_1
296
+ encoder.block.7.layer.1.DenseReluDense.wo
297
+ Done.
298
+
workspace/flan-ts-xl.txt ADDED
The diff for this file is too large to render. See raw diff
 
workspace/flan-ts-xxl.txt ADDED
The diff for this file is too large to render. See raw diff
 
workspace/ts_large.txt ADDED
@@ -0,0 +1,743 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CUDA extension not installed.
2
+ Some weights of the model checkpoint at t5-large were not used when initializing T5EncoderModel: ['decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.2.DenseReluDense.wi.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.13.layer.2.DenseReluDense.wi.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.17.layer.2.DenseReluDense.wi.weight', 'decoder.block.12.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.3.layer.2.DenseReluDense.wi.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.2.DenseReluDense.wi.weight', 'decoder.block.14.layer.2.DenseReluDense.wi.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wi.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.2.DenseReluDense.wi.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.final_layer_norm.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.9.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.2.DenseReluDense.wi.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wi.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.21.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.20.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.2.DenseReluDense.wi.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.19.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.4.layer.2.DenseReluDense.wi.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.18.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.2.DenseReluDense.wi.weight']
3
+ - This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
4
+ - This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
5
+ Downloading and preparing dataset wikitext/wikitext-2-raw-v1 to /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126...
6
+ Downloading data: 100%|����������������������������������������| 4.72M/4.72M [00:03<00:00, 1.38MB/s]
7
+ Dataset wikitext downloaded and prepared to /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126. Subsequent calls will reuse this data.
8
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
9
+ Downloading (��)lve/main/config.json: 100%|��| 1.21k/1.21k [00:00<00:00, 4.35MB/s]
10
+ Downloading (��)ve/main/spiece.model: 100%|������| 792k/792k [00:00<00:00, 2.03MB/s]
11
+ Downloading (��)/main/tokenizer.json: 100%|��| 1.39M/1.39M [00:00<00:00, 10.3MB/s]
12
+ /usr/local/lib/python3.10/dist-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
13
+ For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
14
+ - Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
15
+ - If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
16
+ - To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
17
+ warnings.warn(
18
+ Token indices sequence length is longer than the specified maximum sequence length for this model (2837091 > 512). Running this sequence through the model will result in indexing errors
19
+ Starting ...
20
+ Ready.
21
+ 0 layer.0.SelfAttention.q
22
+ Quantizing ...
23
+ time 1.05
24
+ error 230.94436645507812
25
+ 0 layer.0.SelfAttention.k
26
+ Quantizing ...
27
+ time 0.35
28
+ error 15016.095703125
29
+ 0 layer.0.SelfAttention.v
30
+ Quantizing ...
31
+ time 0.38
32
+ error 9677.2041015625
33
+ 0 layer.0.SelfAttention.o
34
+ Quantizing ...
35
+ time 0.35
36
+ error 106466.890625
37
+ 0 layer.1.DenseReluDense.wi
38
+ Quantizing ...
39
+ time 0.38
40
+ error 216545.046875
41
+ 0 layer.1.DenseReluDense.wo
42
+ Quantizing ...
43
+ time 1.43
44
+ error 175480.0
45
+ 1 layer.0.SelfAttention.q
46
+ Quantizing ...
47
+ time 0.54
48
+ error 212.295166015625
49
+ 1 layer.0.SelfAttention.k
50
+ Quantizing ...
51
+ time 0.37
52
+ error 11788.3134765625
53
+ 1 layer.0.SelfAttention.v
54
+ Quantizing ...
55
+ time 0.35
56
+ error 10337.71484375
57
+ 1 layer.0.SelfAttention.o
58
+ Quantizing ...
59
+ time 0.35
60
+ error 78876.84375
61
+ 1 layer.1.DenseReluDense.wi
62
+ Quantizing ...
63
+ time 0.35
64
+ error 362692.28125
65
+ 1 layer.1.DenseReluDense.wo
66
+ Quantizing ...
67
+ time 1.42
68
+ error 330811.875
69
+ 2 layer.0.SelfAttention.q
70
+ Quantizing ...
71
+ time 0.53
72
+ error 149.39337158203125
73
+ 2 layer.0.SelfAttention.k
74
+ Quantizing ...
75
+ time 0.35
76
+ error 8281.7451171875
77
+ 2 layer.0.SelfAttention.v
78
+ Quantizing ...
79
+ time 0.35
80
+ error 9236.6171875
81
+ 2 layer.0.SelfAttention.o
82
+ Quantizing ...
83
+ time 0.35
84
+ error 25642.55859375
85
+ 2 layer.1.DenseReluDense.wi
86
+ Quantizing ...
87
+ time 0.35
88
+ error 635081.875
89
+ 2 layer.1.DenseReluDense.wo
90
+ Quantizing ...
91
+ time 1.42
92
+ error 362131.0
93
+ 3 layer.0.SelfAttention.q
94
+ Quantizing ...
95
+ time 0.53
96
+ error 198.08612060546875
97
+ 3 layer.0.SelfAttention.k
98
+ Quantizing ...
99
+ time 0.37
100
+ error 10755.650390625
101
+ 3 layer.0.SelfAttention.v
102
+ Quantizing ...
103
+ time 0.35
104
+ error 9889.1962890625
105
+ 3 layer.0.SelfAttention.o
106
+ Quantizing ...
107
+ time 0.38
108
+ error 37326.6640625
109
+ 3 layer.1.DenseReluDense.wi
110
+ Quantizing ...
111
+ time 0.35
112
+ error 1070184.5
113
+ 3 layer.1.DenseReluDense.wo
114
+ Quantizing ...
115
+ time 1.47
116
+ error 399097.0625
117
+ 4 layer.0.SelfAttention.q
118
+ Quantizing ...
119
+ time 0.55
120
+ error 232.6760711669922
121
+ 4 layer.0.SelfAttention.k
122
+ Quantizing ...
123
+ time 0.38
124
+ error 12199.326171875
125
+ 4 layer.0.SelfAttention.v
126
+ Quantizing ...
127
+ time 0.36
128
+ error 11181.3046875
129
+ 4 layer.0.SelfAttention.o
130
+ Quantizing ...
131
+ time 0.35
132
+ error 55337.78125
133
+ 4 layer.1.DenseReluDense.wi
134
+ Quantizing ...
135
+ time 0.35
136
+ error 1705248.125
137
+ 4 layer.1.DenseReluDense.wo
138
+ Quantizing ...
139
+ time 1.42
140
+ error 368282.875
141
+ 5 layer.0.SelfAttention.q
142
+ Quantizing ...
143
+ time 0.53
144
+ error 218.76162719726562
145
+ 5 layer.0.SelfAttention.k
146
+ Quantizing ...
147
+ time 0.35
148
+ error 12070.9462890625
149
+ 5 layer.0.SelfAttention.v
150
+ Quantizing ...
151
+ time 0.35
152
+ error 13040.486328125
153
+ 5 layer.0.SelfAttention.o
154
+ Quantizing ...
155
+ time 0.35
156
+ error 77213.7109375
157
+ 5 layer.1.DenseReluDense.wi
158
+ Quantizing ...
159
+ time 0.35
160
+ error 2338288.5
161
+ 5 layer.1.DenseReluDense.wo
162
+ Quantizing ...
163
+ time 1.41
164
+ error 324492.5625
165
+ 6 layer.0.SelfAttention.q
166
+ Quantizing ...
167
+ time 0.53
168
+ error 212.82241821289062
169
+ 6 layer.0.SelfAttention.k
170
+ Quantizing ...
171
+ time 0.37
172
+ error 11916.390625
173
+ 6 layer.0.SelfAttention.v
174
+ Quantizing ...
175
+ time 0.35
176
+ error 13278.4794921875
177
+ 6 layer.0.SelfAttention.o
178
+ Quantizing ...
179
+ time 0.38
180
+ error 93256.609375
181
+ 6 layer.1.DenseReluDense.wi
182
+ Quantizing ...
183
+ time 0.35
184
+ error 2914808.0
185
+ 6 layer.1.DenseReluDense.wo
186
+ Quantizing ...
187
+ time 1.47
188
+ error 326483.75
189
+ 7 layer.0.SelfAttention.q
190
+ Quantizing ...
191
+ time 0.56
192
+ error 196.81045532226562
193
+ 7 layer.0.SelfAttention.k
194
+ Quantizing ...
195
+ time 0.36
196
+ error 11539.515625
197
+ 7 layer.0.SelfAttention.v
198
+ Quantizing ...
199
+ time 0.37
200
+ error 14094.767578125
201
+ 7 layer.0.SelfAttention.o
202
+ Quantizing ...
203
+ time 0.35
204
+ error 67957.1171875
205
+ 7 layer.1.DenseReluDense.wi
206
+ Quantizing ...
207
+ time 0.35
208
+ error 2997633.0
209
+ 7 layer.1.DenseReluDense.wo
210
+ Quantizing ...
211
+ time 1.42
212
+ error 415390.4375
213
+ 8 layer.0.SelfAttention.q
214
+ Quantizing ...
215
+ time 0.53
216
+ error 204.32620239257812
217
+ 8 layer.0.SelfAttention.k
218
+ Quantizing ...
219
+ time 0.35
220
+ error 12758.60546875
221
+ 8 layer.0.SelfAttention.v
222
+ Quantizing ...
223
+ time 0.35
224
+ error 20335.3203125
225
+ 8 layer.0.SelfAttention.o
226
+ Quantizing ...
227
+ time 0.35
228
+ error 242356.3125
229
+ 8 layer.1.DenseReluDense.wi
230
+ Quantizing ...
231
+ time 0.35
232
+ error 3908813.5
233
+ 8 layer.1.DenseReluDense.wo
234
+ Quantizing ...
235
+ time 1.43
236
+ error 657590.75
237
+ 9 layer.0.SelfAttention.q
238
+ Quantizing ...
239
+ time 0.53
240
+ error 165.850830078125
241
+ 9 layer.0.SelfAttention.k
242
+ Quantizing ...
243
+ time 0.36
244
+ error 11305.962890625
245
+ 9 layer.0.SelfAttention.v
246
+ Quantizing ...
247
+ time 0.36
248
+ error 20146.72265625
249
+ 9 layer.0.SelfAttention.o
250
+ Quantizing ...
251
+ time 0.36
252
+ error 155148.46875
253
+ 9 layer.1.DenseReluDense.wi
254
+ Quantizing ...
255
+ time 0.36
256
+ error 4378728.5
257
+ 9 layer.1.DenseReluDense.wo
258
+ Quantizing ...
259
+ time 1.47
260
+ error 785346.5625
261
+ 10 layer.0.SelfAttention.q
262
+ Quantizing ...
263
+ time 0.55
264
+ error 150.81277465820312
265
+ 10 layer.0.SelfAttention.k
266
+ Quantizing ...
267
+ time 0.35
268
+ error 8967.4853515625
269
+ 10 layer.0.SelfAttention.v
270
+ Quantizing ...
271
+ time 0.38
272
+ error 19551.57421875
273
+ 10 layer.0.SelfAttention.o
274
+ Quantizing ...
275
+ time 0.35
276
+ error 159628.03125
277
+ 10 layer.1.DenseReluDense.wi
278
+ Quantizing ...
279
+ time 0.35
280
+ error 5331122.5
281
+ 10 layer.1.DenseReluDense.wo
282
+ Quantizing ...
283
+ time 1.41
284
+ error 987081.75
285
+ 11 layer.0.SelfAttention.q
286
+ Quantizing ...
287
+ time 0.53
288
+ error 148.4892120361328
289
+ 11 layer.0.SelfAttention.k
290
+ Quantizing ...
291
+ time 0.35
292
+ error 10070.583984375
293
+ 11 layer.0.SelfAttention.v
294
+ Quantizing ...
295
+ time 0.35
296
+ error 22689.8046875
297
+ 11 layer.0.SelfAttention.o
298
+ Quantizing ...
299
+ time 0.35
300
+ error 158388.921875
301
+ 11 layer.1.DenseReluDense.wi
302
+ Quantizing ...
303
+ time 0.35
304
+ error 5614285.0
305
+ 11 layer.1.DenseReluDense.wo
306
+ Quantizing ...
307
+ time 1.41
308
+ error 1036498.25
309
+ 12 layer.0.SelfAttention.q
310
+ Quantizing ...
311
+ time 0.53
312
+ error 143.14183044433594
313
+ 12 layer.0.SelfAttention.k
314
+ Quantizing ...
315
+ time 0.35
316
+ error 10775.267578125
317
+ 12 layer.0.SelfAttention.v
318
+ Quantizing ...
319
+ time 0.37
320
+ error 30807.22265625
321
+ 12 layer.0.SelfAttention.o
322
+ Quantizing ...
323
+ time 0.36
324
+ error 518529.21875
325
+ 12 layer.1.DenseReluDense.wi
326
+ Quantizing ...
327
+ time 0.37
328
+ error 5196545.0
329
+ 12 layer.1.DenseReluDense.wo
330
+ Quantizing ...
331
+ time 1.44
332
+ error 1605865.0
333
+ 13 layer.0.SelfAttention.q
334
+ Quantizing ...
335
+ time 0.54
336
+ error 132.04205322265625
337
+ 13 layer.0.SelfAttention.k
338
+ Quantizing ...
339
+ time 0.36
340
+ error 9211.498046875
341
+ 13 layer.0.SelfAttention.v
342
+ Quantizing ...
343
+ time 0.37
344
+ error 32021.294921875
345
+ 13 layer.0.SelfAttention.o
346
+ Quantizing ...
347
+ time 0.36
348
+ error 389801.46875
349
+ 13 layer.1.DenseReluDense.wi
350
+ Quantizing ...
351
+ time 0.35
352
+ error 6028052.0
353
+ 13 layer.1.DenseReluDense.wo
354
+ Quantizing ...
355
+ time 1.40
356
+ error 1947110.25
357
+ 14 layer.0.SelfAttention.q
358
+ Quantizing ...
359
+ time 0.53
360
+ error 109.33882904052734
361
+ 14 layer.0.SelfAttention.k
362
+ Quantizing ...
363
+ time 0.35
364
+ error 8652.20703125
365
+ 14 layer.0.SelfAttention.v
366
+ Quantizing ...
367
+ time 0.35
368
+ error 29946.4140625
369
+ 14 layer.0.SelfAttention.o
370
+ Quantizing ...
371
+ time 0.35
372
+ error 351310.0
373
+ 14 layer.1.DenseReluDense.wi
374
+ Quantizing ...
375
+ time 0.35
376
+ error 6125760.5
377
+ 14 layer.1.DenseReluDense.wo
378
+ Quantizing ...
379
+ time 1.41
380
+ error 2735209.0
381
+ 15 layer.0.SelfAttention.q
382
+ Quantizing ...
383
+ time 0.53
384
+ error 113.90670776367188
385
+ 15 layer.0.SelfAttention.k
386
+ Quantizing ...
387
+ time 0.35
388
+ error 8382.978515625
389
+ 15 layer.0.SelfAttention.v
390
+ Quantizing ...
391
+ time 0.36
392
+ error 35500.65234375
393
+ 15 layer.0.SelfAttention.o
394
+ Quantizing ...
395
+ time 0.35
396
+ error 520358.59375
397
+ 15 layer.1.DenseReluDense.wi
398
+ Quantizing ...
399
+ time 0.38
400
+ error 6121543.5
401
+ 15 layer.1.DenseReluDense.wo
402
+ Quantizing ...
403
+ time 1.43
404
+ error 3549418.5
405
+ 16 layer.0.SelfAttention.q
406
+ Quantizing ...
407
+ time 0.53
408
+ error 106.98755645751953
409
+ 16 layer.0.SelfAttention.k
410
+ Quantizing ...
411
+ time 0.37
412
+ error 7904.42333984375
413
+ 16 layer.0.SelfAttention.v
414
+ Quantizing ...
415
+ time 0.35
416
+ error 40152.375
417
+ 16 layer.0.SelfAttention.o
418
+ Quantizing ...
419
+ time 0.38
420
+ error 1242878.0
421
+ 16 layer.1.DenseReluDense.wi
422
+ Quantizing ...
423
+ time 0.35
424
+ error 8400617.0
425
+ 16 layer.1.DenseReluDense.wo
426
+ Quantizing ...
427
+ time 1.40
428
+ error 5480047.0
429
+ 17 layer.0.SelfAttention.q
430
+ Quantizing ...
431
+ time 0.53
432
+ error 98.13764190673828
433
+ 17 layer.0.SelfAttention.k
434
+ Quantizing ...
435
+ time 0.35
436
+ error 7841.5126953125
437
+ 17 layer.0.SelfAttention.v
438
+ Quantizing ...
439
+ time 0.35
440
+ error 46148.609375
441
+ 17 layer.0.SelfAttention.o
442
+ Quantizing ...
443
+ time 0.35
444
+ error 1168839.625
445
+ 17 layer.1.DenseReluDense.wi
446
+ Quantizing ...
447
+ time 0.36
448
+ error 7634862.0
449
+ 17 layer.1.DenseReluDense.wo
450
+ Quantizing ...
451
+ time 1.41
452
+ error 4989134.0
453
+ 18 layer.0.SelfAttention.q
454
+ Quantizing ...
455
+ time 0.53
456
+ error 102.72500610351562
457
+ 18 layer.0.SelfAttention.k
458
+ Quantizing ...
459
+ time 0.35
460
+ error 7599.8544921875
461
+ 18 layer.0.SelfAttention.v
462
+ Quantizing ...
463
+ time 0.35
464
+ error 55332.08203125
465
+ 18 layer.0.SelfAttention.o
466
+ Quantizing ...
467
+ time 0.36
468
+ error 3184639.0
469
+ 18 layer.1.DenseReluDense.wi
470
+ Quantizing ...
471
+ time 0.36
472
+ error 6987084.5
473
+ 18 layer.1.DenseReluDense.wo
474
+ Quantizing ...
475
+ time 1.45
476
+ error 7245906.0
477
+ 19 layer.0.SelfAttention.q
478
+ Quantizing ...
479
+ time 0.53
480
+ error 81.86250305175781
481
+ 19 layer.0.SelfAttention.k
482
+ Quantizing ...
483
+ time 0.37
484
+ error 5452.095703125
485
+ 19 layer.0.SelfAttention.v
486
+ Quantizing ...
487
+ time 0.37
488
+ error 50052.5
489
+ 19 layer.0.SelfAttention.o
490
+ Quantizing ...
491
+ time 0.38
492
+ error 2986069.25
493
+ 19 layer.1.DenseReluDense.wi
494
+ Quantizing ...
495
+ time 0.35
496
+ error 9018568.0
497
+ 19 layer.1.DenseReluDense.wo
498
+ Quantizing ...
499
+ time 1.41
500
+ error 10263636.0
501
+ 20 layer.0.SelfAttention.q
502
+ Quantizing ...
503
+ time 0.53
504
+ error 76.51995086669922
505
+ 20 layer.0.SelfAttention.k
506
+ Quantizing ...
507
+ time 0.35
508
+ error 5472.42333984375
509
+ 20 layer.0.SelfAttention.v
510
+ Quantizing ...
511
+ time 0.35
512
+ error 41930.93359375
513
+ 20 layer.0.SelfAttention.o
514
+ Quantizing ...
515
+ time 0.35
516
+ error 2892769.5
517
+ 20 layer.1.DenseReluDense.wi
518
+ Quantizing ...
519
+ time 0.35
520
+ error 11466556.0
521
+ 20 layer.1.DenseReluDense.wo
522
+ Quantizing ...
523
+ time 1.41
524
+ error 24789348.0
525
+ 21 layer.0.SelfAttention.q
526
+ Quantizing ...
527
+ time 0.53
528
+ error 99.8782958984375
529
+ 21 layer.0.SelfAttention.k
530
+ Quantizing ...
531
+ time 0.35
532
+ error 6085.8701171875
533
+ 21 layer.0.SelfAttention.v
534
+ Quantizing ...
535
+ time 0.35
536
+ error 59590.58984375
537
+ 21 layer.0.SelfAttention.o
538
+ Quantizing ...
539
+ time 0.38
540
+ error 4403669.0
541
+ 21 layer.1.DenseReluDense.wi
542
+ Quantizing ...
543
+ time 0.37
544
+ error 18229172.0
545
+ 21 layer.1.DenseReluDense.wo
546
+ Quantizing ...
547
+ time 1.49
548
+ error 16509261.0
549
+ 22 layer.0.SelfAttention.q
550
+ Quantizing ...
551
+ time 0.53
552
+ error 92.60875701904297
553
+ 22 layer.0.SelfAttention.k
554
+ Quantizing ...
555
+ time 0.37
556
+ error 7184.3828125
557
+ 22 layer.0.SelfAttention.v
558
+ Quantizing ...
559
+ time 0.36
560
+ error 63427.015625
561
+ 22 layer.0.SelfAttention.o
562
+ Quantizing ...
563
+ time 0.37
564
+ error 5621765.0
565
+ 22 layer.1.DenseReluDense.wi
566
+ Quantizing ...
567
+ time 0.36
568
+ error 19273876.0
569
+ 22 layer.1.DenseReluDense.wo
570
+ Quantizing ...
571
+ time 1.41
572
+ error 26262050.0
573
+ 23 layer.0.SelfAttention.q
574
+ Quantizing ...
575
+ time 0.53
576
+ error 94.61616516113281
577
+ 23 layer.0.SelfAttention.k
578
+ Quantizing ...
579
+ time 0.35
580
+ error 6629.54931640625
581
+ 23 layer.0.SelfAttention.v
582
+ Quantizing ...
583
+ time 0.35
584
+ error 79510.0
585
+ 23 layer.0.SelfAttention.o
586
+ Quantizing ...
587
+ time 0.35
588
+ error 9020421.0
589
+ 23 layer.1.DenseReluDense.wi
590
+ Quantizing ...
591
+ time 0.35
592
+ error 11331573.0
593
+ 23 layer.1.DenseReluDense.wo
594
+ Quantizing ...
595
+ time 1.40
596
+ error 37987768.0
597
+ 138.1449637413025
598
+ Packing ...
599
+ encoder.block.0.layer.0.SelfAttention.q
600
+ encoder.block.0.layer.0.SelfAttention.k
601
+ encoder.block.0.layer.0.SelfAttention.v
602
+ encoder.block.0.layer.0.SelfAttention.o
603
+ encoder.block.0.layer.1.DenseReluDense.wi
604
+ encoder.block.0.layer.1.DenseReluDense.wo
605
+ encoder.block.1.layer.0.SelfAttention.q
606
+ encoder.block.1.layer.0.SelfAttention.k
607
+ encoder.block.1.layer.0.SelfAttention.v
608
+ encoder.block.1.layer.0.SelfAttention.o
609
+ encoder.block.1.layer.1.DenseReluDense.wi
610
+ encoder.block.1.layer.1.DenseReluDense.wo
611
+ encoder.block.2.layer.0.SelfAttention.q
612
+ encoder.block.2.layer.0.SelfAttention.k
613
+ encoder.block.2.layer.0.SelfAttention.v
614
+ encoder.block.2.layer.0.SelfAttention.o
615
+ encoder.block.2.layer.1.DenseReluDense.wi
616
+ encoder.block.2.layer.1.DenseReluDense.wo
617
+ encoder.block.3.layer.0.SelfAttention.q
618
+ encoder.block.3.layer.0.SelfAttention.k
619
+ encoder.block.3.layer.0.SelfAttention.v
620
+ encoder.block.3.layer.0.SelfAttention.o
621
+ encoder.block.3.layer.1.DenseReluDense.wi
622
+ encoder.block.3.layer.1.DenseReluDense.wo
623
+ encoder.block.4.layer.0.SelfAttention.q
624
+ encoder.block.4.layer.0.SelfAttention.k
625
+ encoder.block.4.layer.0.SelfAttention.v
626
+ encoder.block.4.layer.0.SelfAttention.o
627
+ encoder.block.4.layer.1.DenseReluDense.wi
628
+ encoder.block.4.layer.1.DenseReluDense.wo
629
+ encoder.block.5.layer.0.SelfAttention.q
630
+ encoder.block.5.layer.0.SelfAttention.k
631
+ encoder.block.5.layer.0.SelfAttention.v
632
+ encoder.block.5.layer.0.SelfAttention.o
633
+ encoder.block.5.layer.1.DenseReluDense.wi
634
+ encoder.block.5.layer.1.DenseReluDense.wo
635
+ encoder.block.6.layer.0.SelfAttention.q
636
+ encoder.block.6.layer.0.SelfAttention.k
637
+ encoder.block.6.layer.0.SelfAttention.v
638
+ encoder.block.6.layer.0.SelfAttention.o
639
+ encoder.block.6.layer.1.DenseReluDense.wi
640
+ encoder.block.6.layer.1.DenseReluDense.wo
641
+ encoder.block.7.layer.0.SelfAttention.q
642
+ encoder.block.7.layer.0.SelfAttention.k
643
+ encoder.block.7.layer.0.SelfAttention.v
644
+ encoder.block.7.layer.0.SelfAttention.o
645
+ encoder.block.7.layer.1.DenseReluDense.wi
646
+ encoder.block.7.layer.1.DenseReluDense.wo
647
+ encoder.block.8.layer.0.SelfAttention.q
648
+ encoder.block.8.layer.0.SelfAttention.k
649
+ encoder.block.8.layer.0.SelfAttention.v
650
+ encoder.block.8.layer.0.SelfAttention.o
651
+ encoder.block.8.layer.1.DenseReluDense.wi
652
+ encoder.block.8.layer.1.DenseReluDense.wo
653
+ encoder.block.9.layer.0.SelfAttention.q
654
+ encoder.block.9.layer.0.SelfAttention.k
655
+ encoder.block.9.layer.0.SelfAttention.v
656
+ encoder.block.9.layer.0.SelfAttention.o
657
+ encoder.block.9.layer.1.DenseReluDense.wi
658
+ encoder.block.9.layer.1.DenseReluDense.wo
659
+ encoder.block.10.layer.0.SelfAttention.q
660
+ encoder.block.10.layer.0.SelfAttention.k
661
+ encoder.block.10.layer.0.SelfAttention.v
662
+ encoder.block.10.layer.0.SelfAttention.o
663
+ encoder.block.10.layer.1.DenseReluDense.wi
664
+ encoder.block.10.layer.1.DenseReluDense.wo
665
+ encoder.block.11.layer.0.SelfAttention.q
666
+ encoder.block.11.layer.0.SelfAttention.k
667
+ encoder.block.11.layer.0.SelfAttention.v
668
+ encoder.block.11.layer.0.SelfAttention.o
669
+ encoder.block.11.layer.1.DenseReluDense.wi
670
+ encoder.block.11.layer.1.DenseReluDense.wo
671
+ encoder.block.12.layer.0.SelfAttention.q
672
+ encoder.block.12.layer.0.SelfAttention.k
673
+ encoder.block.12.layer.0.SelfAttention.v
674
+ encoder.block.12.layer.0.SelfAttention.o
675
+ encoder.block.12.layer.1.DenseReluDense.wi
676
+ encoder.block.12.layer.1.DenseReluDense.wo
677
+ encoder.block.13.layer.0.SelfAttention.q
678
+ encoder.block.13.layer.0.SelfAttention.k
679
+ encoder.block.13.layer.0.SelfAttention.v
680
+ encoder.block.13.layer.0.SelfAttention.o
681
+ encoder.block.13.layer.1.DenseReluDense.wi
682
+ encoder.block.13.layer.1.DenseReluDense.wo
683
+ encoder.block.14.layer.0.SelfAttention.q
684
+ encoder.block.14.layer.0.SelfAttention.k
685
+ encoder.block.14.layer.0.SelfAttention.v
686
+ encoder.block.14.layer.0.SelfAttention.o
687
+ encoder.block.14.layer.1.DenseReluDense.wi
688
+ encoder.block.14.layer.1.DenseReluDense.wo
689
+ encoder.block.15.layer.0.SelfAttention.q
690
+ encoder.block.15.layer.0.SelfAttention.k
691
+ encoder.block.15.layer.0.SelfAttention.v
692
+ encoder.block.15.layer.0.SelfAttention.o
693
+ encoder.block.15.layer.1.DenseReluDense.wi
694
+ encoder.block.15.layer.1.DenseReluDense.wo
695
+ encoder.block.16.layer.0.SelfAttention.q
696
+ encoder.block.16.layer.0.SelfAttention.k
697
+ encoder.block.16.layer.0.SelfAttention.v
698
+ encoder.block.16.layer.0.SelfAttention.o
699
+ encoder.block.16.layer.1.DenseReluDense.wi
700
+ encoder.block.16.layer.1.DenseReluDense.wo
701
+ encoder.block.17.layer.0.SelfAttention.q
702
+ encoder.block.17.layer.0.SelfAttention.k
703
+ encoder.block.17.layer.0.SelfAttention.v
704
+ encoder.block.17.layer.0.SelfAttention.o
705
+ encoder.block.17.layer.1.DenseReluDense.wi
706
+ encoder.block.17.layer.1.DenseReluDense.wo
707
+ encoder.block.18.layer.0.SelfAttention.q
708
+ encoder.block.18.layer.0.SelfAttention.k
709
+ encoder.block.18.layer.0.SelfAttention.v
710
+ encoder.block.18.layer.0.SelfAttention.o
711
+ encoder.block.18.layer.1.DenseReluDense.wi
712
+ encoder.block.18.layer.1.DenseReluDense.wo
713
+ encoder.block.19.layer.0.SelfAttention.q
714
+ encoder.block.19.layer.0.SelfAttention.k
715
+ encoder.block.19.layer.0.SelfAttention.v
716
+ encoder.block.19.layer.0.SelfAttention.o
717
+ encoder.block.19.layer.1.DenseReluDense.wi
718
+ encoder.block.19.layer.1.DenseReluDense.wo
719
+ encoder.block.20.layer.0.SelfAttention.q
720
+ encoder.block.20.layer.0.SelfAttention.k
721
+ encoder.block.20.layer.0.SelfAttention.v
722
+ encoder.block.20.layer.0.SelfAttention.o
723
+ encoder.block.20.layer.1.DenseReluDense.wi
724
+ encoder.block.20.layer.1.DenseReluDense.wo
725
+ encoder.block.21.layer.0.SelfAttention.q
726
+ encoder.block.21.layer.0.SelfAttention.k
727
+ encoder.block.21.layer.0.SelfAttention.v
728
+ encoder.block.21.layer.0.SelfAttention.o
729
+ encoder.block.21.layer.1.DenseReluDense.wi
730
+ encoder.block.21.layer.1.DenseReluDense.wo
731
+ encoder.block.22.layer.0.SelfAttention.q
732
+ encoder.block.22.layer.0.SelfAttention.k
733
+ encoder.block.22.layer.0.SelfAttention.v
734
+ encoder.block.22.layer.0.SelfAttention.o
735
+ encoder.block.22.layer.1.DenseReluDense.wi
736
+ encoder.block.22.layer.1.DenseReluDense.wo
737
+ encoder.block.23.layer.0.SelfAttention.q
738
+ encoder.block.23.layer.0.SelfAttention.k
739
+ encoder.block.23.layer.0.SelfAttention.v
740
+ encoder.block.23.layer.0.SelfAttention.o
741
+ encoder.block.23.layer.1.DenseReluDense.wi
742
+ encoder.block.23.layer.1.DenseReluDense.wo
743
+ Done.
workspace/ts_xxl_record.txt ADDED
@@ -0,0 +1,853 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CUDA extension not installed.
2
+ Some weights of the model checkpoint at google/t5-v1_1-xxl were not used when initializing T5EncoderModel: ['decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.embed_tokens.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'lm_head.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.final_layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight']
3
+ - This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
4
+ - This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
5
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
6
+ Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
7
+ Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
8
+ Starting ...
9
+ Ready.
10
+ 0 layer.0.SelfAttention.q
11
+ Quantizing ...
12
+ time 2.80
13
+ error 137.22543334960938
14
+ 0 layer.0.SelfAttention.k
15
+ Quantizing ...
16
+ time 1.03
17
+ error 11656.236328125
18
+ 0 layer.0.SelfAttention.v
19
+ Quantizing ...
20
+ time 1.04
21
+ error 10592.220703125
22
+ 0 layer.0.SelfAttention.o
23
+ Quantizing ...
24
+ time 1.03
25
+ error 120966.59375
26
+ 0 layer.1.DenseReluDense.wi_0
27
+ Quantizing ...
28
+ time 1.05
29
+ error 38126.375
30
+ 0 layer.1.DenseReluDense.wi_1
31
+ Quantizing ...
32
+ time 1.04
33
+ error 32506.427734375
34
+ 0 layer.1.DenseReluDense.wo
35
+ Quantizing ...
36
+ time 2.81
37
+ error 214925.140625
38
+ 1 layer.0.SelfAttention.q
39
+ Quantizing ...
40
+ time 2.27
41
+ error 253.24050903320312
42
+ 1 layer.0.SelfAttention.k
43
+ Quantizing ...
44
+ time 1.01
45
+ error 15095.802734375
46
+ 1 layer.0.SelfAttention.v
47
+ Quantizing ...
48
+ time 1.03
49
+ error 4179.1083984375
50
+ 1 layer.0.SelfAttention.o
51
+ Quantizing ...
52
+ time 1.03
53
+ error 20773.45703125
54
+ 1 layer.1.DenseReluDense.wi_0
55
+ Quantizing ...
56
+ time 1.03
57
+ error 28934.0859375
58
+ 1 layer.1.DenseReluDense.wi_1
59
+ Quantizing ...
60
+ time 1.05
61
+ error 24144.3125
62
+ 1 layer.1.DenseReluDense.wo
63
+ Quantizing ...
64
+ time 2.75
65
+ error 97274.90625
66
+ 2 layer.0.SelfAttention.q
67
+ Quantizing ...
68
+ time 2.34
69
+ error 205.71896362304688
70
+ 2 layer.0.SelfAttention.k
71
+ Quantizing ...
72
+ time 1.05
73
+ error 10929.7021484375
74
+ 2 layer.0.SelfAttention.v
75
+ Quantizing ...
76
+ time 1.06
77
+ error 3825.074462890625
78
+ 2 layer.0.SelfAttention.o
79
+ Quantizing ...
80
+ time 1.02
81
+ error 2498.05859375
82
+ 2 layer.1.DenseReluDense.wi_0
83
+ Quantizing ...
84
+ time 1.03
85
+ error 42947.859375
86
+ 2 layer.1.DenseReluDense.wi_1
87
+ Quantizing ...
88
+ time 1.03
89
+ error 36752.1171875
90
+ 2 layer.1.DenseReluDense.wo
91
+ Quantizing ...
92
+ time 2.71
93
+ error 135178.4375
94
+ 3 layer.0.SelfAttention.q
95
+ Quantizing ...
96
+ time 2.31
97
+ error 263.6244201660156
98
+ 3 layer.0.SelfAttention.k
99
+ Quantizing ...
100
+ time 1.06
101
+ error 13956.330078125
102
+ 3 layer.0.SelfAttention.v
103
+ Quantizing ...
104
+ time 1.06
105
+ error 5999.3544921875
106
+ 3 layer.0.SelfAttention.o
107
+ Quantizing ...
108
+ time 1.05
109
+ error 5389.494140625
110
+ 3 layer.1.DenseReluDense.wi_0
111
+ Quantizing ...
112
+ time 1.10
113
+ error 43406.984375
114
+ 3 layer.1.DenseReluDense.wi_1
115
+ Quantizing ...
116
+ time 1.07
117
+ error 40294.578125
118
+ 3 layer.1.DenseReluDense.wo
119
+ Quantizing ...
120
+ time 2.80
121
+ error 136006.0
122
+ 4 layer.0.SelfAttention.q
123
+ Quantizing ...
124
+ time 2.30
125
+ error 300.17022705078125
126
+ 4 layer.0.SelfAttention.k
127
+ Quantizing ...
128
+ time 1.03
129
+ error 16043.65234375
130
+ 4 layer.0.SelfAttention.v
131
+ Quantizing ...
132
+ time 1.03
133
+ error 6112.3857421875
134
+ 4 layer.0.SelfAttention.o
135
+ Quantizing ...
136
+ time 1.03
137
+ error 4162.61474609375
138
+ 4 layer.1.DenseReluDense.wi_0
139
+ Quantizing ...
140
+ time 1.06
141
+ error 44532.5625
142
+ 4 layer.1.DenseReluDense.wi_1
143
+ Quantizing ...
144
+ time 1.07
145
+ error 42825.140625
146
+ 4 layer.1.DenseReluDense.wo
147
+ Quantizing ...
148
+ time 2.88
149
+ error 165037.09375
150
+ 5 layer.0.SelfAttention.q
151
+ Quantizing ...
152
+ time 2.28
153
+ error 352.9566650390625
154
+ 5 layer.0.SelfAttention.k
155
+ Quantizing ...
156
+ time 1.03
157
+ error 19099.544921875
158
+ 5 layer.0.SelfAttention.v
159
+ Quantizing ...
160
+ time 1.02
161
+ error 6900.2197265625
162
+ 5 layer.0.SelfAttention.o
163
+ Quantizing ...
164
+ time 1.03
165
+ error 14074.9541015625
166
+ 5 layer.1.DenseReluDense.wi_0
167
+ Quantizing ...
168
+ time 1.05
169
+ error 38257.37109375
170
+ 5 layer.1.DenseReluDense.wi_1
171
+ Quantizing ...
172
+ time 1.04
173
+ error 36839.3046875
174
+ 5 layer.1.DenseReluDense.wo
175
+ Quantizing ...
176
+ time 2.76
177
+ error 132062.96875
178
+ 6 layer.0.SelfAttention.q
179
+ Quantizing ...
180
+ time 2.33
181
+ error 385.77520751953125
182
+ 6 layer.0.SelfAttention.k
183
+ Quantizing ...
184
+ time 1.06
185
+ error 22221.486328125
186
+ 6 layer.0.SelfAttention.v
187
+ Quantizing ...
188
+ time 1.02
189
+ error 7855.71533203125
190
+ 6 layer.0.SelfAttention.o
191
+ Quantizing ...
192
+ time 1.04
193
+ error 20587.6171875
194
+ 6 layer.1.DenseReluDense.wi_0
195
+ Quantizing ...
196
+ time 1.05
197
+ error 34824.55078125
198
+ 6 layer.1.DenseReluDense.wi_1
199
+ Quantizing ...
200
+ time 1.05
201
+ error 36079.15625
202
+ 6 layer.1.DenseReluDense.wo
203
+ Quantizing ...
204
+ time 2.74
205
+ error 166183.125
206
+ 7 layer.0.SelfAttention.q
207
+ Quantizing ...
208
+ time 2.32
209
+ error 304.88519287109375
210
+ 7 layer.0.SelfAttention.k
211
+ Quantizing ...
212
+ time 1.05
213
+ error 21111.80859375
214
+ 7 layer.0.SelfAttention.v
215
+ Quantizing ...
216
+ time 1.05
217
+ error 5978.3095703125
218
+ 7 layer.0.SelfAttention.o
219
+ Quantizing ...
220
+ time 1.08
221
+ error 10927.888671875
222
+ 7 layer.1.DenseReluDense.wi_0
223
+ Quantizing ...
224
+ time 1.07
225
+ error 29760.138671875
226
+ 7 layer.1.DenseReluDense.wi_1
227
+ Quantizing ...
228
+ time 1.08
229
+ error 33814.875
230
+ 7 layer.1.DenseReluDense.wo
231
+ Quantizing ...
232
+ time 2.73
233
+ error 175563.4375
234
+ 8 layer.0.SelfAttention.q
235
+ Quantizing ...
236
+ time 2.30
237
+ error 333.85931396484375
238
+ 8 layer.0.SelfAttention.k
239
+ Quantizing ...
240
+ time 1.03
241
+ error 24634.984375
242
+ 8 layer.0.SelfAttention.v
243
+ Quantizing ...
244
+ time 1.03
245
+ error 7116.8212890625
246
+ 8 layer.0.SelfAttention.o
247
+ Quantizing ...
248
+ time 1.07
249
+ error 15384.3369140625
250
+ 8 layer.1.DenseReluDense.wi_0
251
+ Quantizing ...
252
+ time 1.07
253
+ error 28838.537109375
254
+ 8 layer.1.DenseReluDense.wi_1
255
+ Quantizing ...
256
+ time 1.09
257
+ error 29991.21875
258
+ 8 layer.1.DenseReluDense.wo
259
+ Quantizing ...
260
+ time 2.85
261
+ error 170053.9375
262
+ 9 layer.0.SelfAttention.q
263
+ Quantizing ...
264
+ time 2.27
265
+ error 354.49725341796875
266
+ 9 layer.0.SelfAttention.k
267
+ Quantizing ...
268
+ time 1.02
269
+ error 26472.80078125
270
+ 9 layer.0.SelfAttention.v
271
+ Quantizing ...
272
+ time 1.02
273
+ error 9778.65234375
274
+ 9 layer.0.SelfAttention.o
275
+ Quantizing ...
276
+ time 1.03
277
+ error 46135.9140625
278
+ 9 layer.1.DenseReluDense.wi_0
279
+ Quantizing ...
280
+ time 1.05
281
+ error 30183.34765625
282
+ 9 layer.1.DenseReluDense.wi_1
283
+ Quantizing ...
284
+ time 1.05
285
+ error 35315.9375
286
+ 9 layer.1.DenseReluDense.wo
287
+ Quantizing ...
288
+ time 2.80
289
+ error 294261.34375
290
+ 10 layer.0.SelfAttention.q
291
+ Quantizing ...
292
+ time 2.36
293
+ error 330.4294128417969
294
+ 10 layer.0.SelfAttention.k
295
+ Quantizing ...
296
+ time 1.04
297
+ error 21810.806640625
298
+ 10 layer.0.SelfAttention.v
299
+ Quantizing ...
300
+ time 1.03
301
+ error 7377.060546875
302
+ 10 layer.0.SelfAttention.o
303
+ Quantizing ...
304
+ time 1.03
305
+ error 31458.453125
306
+ 10 layer.1.DenseReluDense.wi_0
307
+ Quantizing ...
308
+ time 1.05
309
+ error 30981.423828125
310
+ 10 layer.1.DenseReluDense.wi_1
311
+ Quantizing ...
312
+ time 1.05
313
+ error 45770.9140625
314
+ 10 layer.1.DenseReluDense.wo
315
+ Quantizing ...
316
+ time 2.73
317
+ error 338105.5625
318
+ 11 layer.0.SelfAttention.q
319
+ Quantizing ...
320
+ time 2.35
321
+ error 332.6951904296875
322
+ 11 layer.0.SelfAttention.k
323
+ Quantizing ...
324
+ time 1.06
325
+ error 23045.384765625
326
+ 11 layer.0.SelfAttention.v
327
+ Quantizing ...
328
+ time 1.07
329
+ error 9068.484375
330
+ 11 layer.0.SelfAttention.o
331
+ Quantizing ...
332
+ time 1.09
333
+ error 39716.03125
334
+ 11 layer.1.DenseReluDense.wi_0
335
+ Quantizing ...
336
+ time 1.05
337
+ error 29951.611328125
338
+ 11 layer.1.DenseReluDense.wi_1
339
+ Quantizing ...
340
+ time 1.06
341
+ error 46667.8828125
342
+ 11 layer.1.DenseReluDense.wo
343
+ Quantizing ...
344
+ time 2.76
345
+ error 458927.0
346
+ 12 layer.0.SelfAttention.q
347
+ Quantizing ...
348
+ time 2.29
349
+ error 364.91387939453125
350
+ 12 layer.0.SelfAttention.k
351
+ Quantizing ...
352
+ time 1.03
353
+ error 26386.5546875
354
+ 12 layer.0.SelfAttention.v
355
+ Quantizing ...
356
+ time 1.08
357
+ error 10412.025390625
358
+ 12 layer.0.SelfAttention.o
359
+ Quantizing ...
360
+ time 1.07
361
+ error 69506.734375
362
+ 12 layer.1.DenseReluDense.wi_0
363
+ Quantizing ...
364
+ time 1.08
365
+ error 32437.169921875
366
+ 12 layer.1.DenseReluDense.wi_1
367
+ Quantizing ...
368
+ time 1.13
369
+ error 54537.1328125
370
+ 12 layer.1.DenseReluDense.wo
371
+ Quantizing ...
372
+ time 2.81
373
+ error 555848.125
374
+ 13 layer.0.SelfAttention.q
375
+ Quantizing ...
376
+ time 2.28
377
+ error 334.4095153808594
378
+ 13 layer.0.SelfAttention.k
379
+ Quantizing ...
380
+ time 1.04
381
+ error 24624.59375
382
+ 13 layer.0.SelfAttention.v
383
+ Quantizing ...
384
+ time 1.04
385
+ error 11093.2373046875
386
+ 13 layer.0.SelfAttention.o
387
+ Quantizing ...
388
+ time 1.02
389
+ error 73139.5859375
390
+ 13 layer.1.DenseReluDense.wi_0
391
+ Quantizing ...
392
+ time 1.06
393
+ error 31185.44921875
394
+ 13 layer.1.DenseReluDense.wi_1
395
+ Quantizing ...
396
+ time 1.08
397
+ error 63193.28125
398
+ 13 layer.1.DenseReluDense.wo
399
+ Quantizing ...
400
+ time 2.84
401
+ error 484003.5
402
+ 14 layer.0.SelfAttention.q
403
+ Quantizing ...
404
+ time 2.33
405
+ error 315.36883544921875
406
+ 14 layer.0.SelfAttention.k
407
+ Quantizing ...
408
+ time 1.02
409
+ error 22693.66015625
410
+ 14 layer.0.SelfAttention.v
411
+ Quantizing ...
412
+ time 1.04
413
+ error 11054.283203125
414
+ 14 layer.0.SelfAttention.o
415
+ Quantizing ...
416
+ time 1.04
417
+ error 55301.96875
418
+ 14 layer.1.DenseReluDense.wi_0
419
+ Quantizing ...
420
+ time 1.06
421
+ error 35040.09765625
422
+ 14 layer.1.DenseReluDense.wi_1
423
+ Quantizing ...
424
+ time 1.04
425
+ error 69227.671875
426
+ 14 layer.1.DenseReluDense.wo
427
+ Quantizing ...
428
+ time 2.76
429
+ error 538346.875
430
+ 15 layer.0.SelfAttention.q
431
+ Quantizing ...
432
+ time 2.31
433
+ error 305.54083251953125
434
+ 15 layer.0.SelfAttention.k
435
+ Quantizing ...
436
+ time 1.05
437
+ error 22575.48046875
438
+ 15 layer.0.SelfAttention.v
439
+ Quantizing ...
440
+ time 1.10
441
+ error 14035.61328125
442
+ 15 layer.0.SelfAttention.o
443
+ Quantizing ...
444
+ time 1.03
445
+ error 100519.5234375
446
+ 15 layer.1.DenseReluDense.wi_0
447
+ Quantizing ...
448
+ time 1.04
449
+ error 34874.54296875
450
+ 15 layer.1.DenseReluDense.wi_1
451
+ Quantizing ...
452
+ time 1.04
453
+ error 76981.28125
454
+ 15 layer.1.DenseReluDense.wo
455
+ Quantizing ...
456
+ time 2.75
457
+ error 590792.75
458
+ 16 layer.0.SelfAttention.q
459
+ Quantizing ...
460
+ time 2.30
461
+ error 292.1910095214844
462
+ 16 layer.0.SelfAttention.k
463
+ Quantizing ...
464
+ time 1.10
465
+ error 24363.197265625
466
+ 16 layer.0.SelfAttention.v
467
+ Quantizing ...
468
+ time 1.08
469
+ error 17756.51953125
470
+ 16 layer.0.SelfAttention.o
471
+ Quantizing ...
472
+ time 1.09
473
+ error 189057.78125
474
+ 16 layer.1.DenseReluDense.wi_0
475
+ Quantizing ...
476
+ time 1.07
477
+ error 35124.7109375
478
+ 16 layer.1.DenseReluDense.wi_1
479
+ Quantizing ...
480
+ time 1.09
481
+ error 87091.78125
482
+ 16 layer.1.DenseReluDense.wo
483
+ Quantizing ...
484
+ time 2.81
485
+ error 1044289.5625
486
+ 17 layer.0.SelfAttention.q
487
+ Quantizing ...
488
+ time 2.28
489
+ error 261.1668701171875
490
+ 17 layer.0.SelfAttention.k
491
+ Quantizing ...
492
+ time 1.02
493
+ error 18598.86328125
494
+ 17 layer.0.SelfAttention.v
495
+ Quantizing ...
496
+ time 1.03
497
+ error 18718.98046875
498
+ 17 layer.0.SelfAttention.o
499
+ Quantizing ...
500
+ time 1.04
501
+ error 254419.0625
502
+ 17 layer.1.DenseReluDense.wi_0
503
+ Quantizing ...
504
+ time 1.07
505
+ error 35458.671875
506
+ 17 layer.1.DenseReluDense.wi_1
507
+ Quantizing ...
508
+ time 1.10
509
+ error 88659.0390625
510
+ 17 layer.1.DenseReluDense.wo
511
+ Quantizing ...
512
+ time 2.87
513
+ error 1568064.75
514
+ 18 layer.0.SelfAttention.q
515
+ Quantizing ...
516
+ time 2.31
517
+ error 282.4662780761719
518
+ 18 layer.0.SelfAttention.k
519
+ Quantizing ...
520
+ time 1.03
521
+ error 19631.552734375
522
+ 18 layer.0.SelfAttention.v
523
+ Quantizing ...
524
+ time 1.06
525
+ error 21855.74609375
526
+ 18 layer.0.SelfAttention.o
527
+ Quantizing ...
528
+ time 1.05
529
+ error 451241.28125
530
+ 18 layer.1.DenseReluDense.wi_0
531
+ Quantizing ...
532
+ time 1.04
533
+ error 35819.91015625
534
+ 18 layer.1.DenseReluDense.wi_1
535
+ Quantizing ...
536
+ time 1.04
537
+ error 96373.1015625
538
+ 18 layer.1.DenseReluDense.wo
539
+ Quantizing ...
540
+ time 2.75
541
+ error 4121681.25
542
+ 19 layer.0.SelfAttention.q
543
+ Quantizing ...
544
+ time 2.33
545
+ error 222.93960571289062
546
+ 19 layer.0.SelfAttention.k
547
+ Quantizing ...
548
+ time 1.08
549
+ error 15299.37890625
550
+ 19 layer.0.SelfAttention.v
551
+ Quantizing ...
552
+ time 1.04
553
+ error 25438.86328125
554
+ 19 layer.0.SelfAttention.o
555
+ Quantizing ...
556
+ time 1.05
557
+ error 1097173.0
558
+ 19 layer.1.DenseReluDense.wi_0
559
+ Quantizing ...
560
+ time 1.06
561
+ error 34149.09375
562
+ 19 layer.1.DenseReluDense.wi_1
563
+ Quantizing ...
564
+ time 1.04
565
+ error 90188.0078125
566
+ 19 layer.1.DenseReluDense.wo
567
+ Quantizing ...
568
+ time 2.74
569
+ error 6266101.0
570
+ 20 layer.0.SelfAttention.q
571
+ Quantizing ...
572
+ time 2.35
573
+ error 211.04458618164062
574
+ 20 layer.0.SelfAttention.k
575
+ Quantizing ...
576
+ time 1.04
577
+ error 13809.572265625
578
+ 20 layer.0.SelfAttention.v
579
+ Quantizing ...
580
+ time 1.06
581
+ error 29788.564453125
582
+ 20 layer.0.SelfAttention.o
583
+ Quantizing ...
584
+ time 1.05
585
+ error 1334543.125
586
+ 20 layer.1.DenseReluDense.wi_0
587
+ Quantizing ...
588
+ time 1.09
589
+ error 31375.771484375
590
+ 20 layer.1.DenseReluDense.wi_1
591
+ Quantizing ...
592
+ time 1.08
593
+ error 78350.203125
594
+ 20 layer.1.DenseReluDense.wo
595
+ Quantizing ...
596
+ time 2.74
597
+ error 7183110.0
598
+ 21 layer.0.SelfAttention.q
599
+ Quantizing ...
600
+ time 2.30
601
+ error 194.26229858398438
602
+ 21 layer.0.SelfAttention.k
603
+ Quantizing ...
604
+ time 1.04
605
+ error 14619.9853515625
606
+ 21 layer.0.SelfAttention.v
607
+ Quantizing ...
608
+ time 1.04
609
+ error 38181.265625
610
+ 21 layer.0.SelfAttention.o
611
+ Quantizing ...
612
+ time 1.05
613
+ error 1776184.0
614
+ 21 layer.1.DenseReluDense.wi_0
615
+ Quantizing ...
616
+ time 1.12
617
+ error 30981.5625
618
+ 21 layer.1.DenseReluDense.wi_1
619
+ Quantizing ...
620
+ time 1.09
621
+ error 77552.046875
622
+ 21 layer.1.DenseReluDense.wo
623
+ Quantizing ...
624
+ time 2.83
625
+ error 9851391.0
626
+ 22 layer.0.SelfAttention.q
627
+ Quantizing ...
628
+ time 2.29
629
+ error 196.11984252929688
630
+ 22 layer.0.SelfAttention.k
631
+ Quantizing ...
632
+ time 1.03
633
+ error 12573.25
634
+ 22 layer.0.SelfAttention.v
635
+ Quantizing ...
636
+ time 1.04
637
+ error 43983.0703125
638
+ 22 layer.0.SelfAttention.o
639
+ Quantizing ...
640
+ time 1.03
641
+ error 1969925.5
642
+ 22 layer.1.DenseReluDense.wi_0
643
+ Quantizing ...
644
+ time 1.05
645
+ error 42481.56640625
646
+ 22 layer.1.DenseReluDense.wi_1
647
+ Quantizing ...
648
+ time 1.04
649
+ error 106760.0078125
650
+ 22 layer.1.DenseReluDense.wo
651
+ Quantizing ...
652
+ time 2.84
653
+ error 15271906.0
654
+ 23 layer.0.SelfAttention.q
655
+ Quantizing ...
656
+ time 2.39
657
+ error 213.98135375976562
658
+ 23 layer.0.SelfAttention.k
659
+ Quantizing ...
660
+ time 1.03
661
+ error 14789.1396484375
662
+ 23 layer.0.SelfAttention.v
663
+ Quantizing ...
664
+ time 1.04
665
+ error 57604.91015625
666
+ 23 layer.0.SelfAttention.o
667
+ Quantizing ...
668
+ time 1.02
669
+ error 2114846.25
670
+ 23 layer.1.DenseReluDense.wi_0
671
+ Quantizing ...
672
+ time 1.05
673
+ error 41047.03125
674
+ 23 layer.1.DenseReluDense.wi_1
675
+ Quantizing ...
676
+ time 1.04
677
+ error 83152.765625
678
+ 23 layer.1.DenseReluDense.wo
679
+ Quantizing ...
680
+ time 2.75
681
+ error 13002426.0
682
+ 728.4299275875092
683
+ Packing ...
684
+ encoder.block.0.layer.0.SelfAttention.q
685
+ encoder.block.0.layer.0.SelfAttention.k
686
+ encoder.block.0.layer.0.SelfAttention.v
687
+ encoder.block.0.layer.0.SelfAttention.o
688
+ encoder.block.0.layer.1.DenseReluDense.wi_0
689
+ encoder.block.0.layer.1.DenseReluDense.wi_1
690
+ encoder.block.0.layer.1.DenseReluDense.wo
691
+ encoder.block.1.layer.0.SelfAttention.q
692
+ encoder.block.1.layer.0.SelfAttention.k
693
+ encoder.block.1.layer.0.SelfAttention.v
694
+ encoder.block.1.layer.0.SelfAttention.o
695
+ encoder.block.1.layer.1.DenseReluDense.wi_0
696
+ encoder.block.1.layer.1.DenseReluDense.wi_1
697
+ encoder.block.1.layer.1.DenseReluDense.wo
698
+ encoder.block.2.layer.0.SelfAttention.q
699
+ encoder.block.2.layer.0.SelfAttention.k
700
+ encoder.block.2.layer.0.SelfAttention.v
701
+ encoder.block.2.layer.0.SelfAttention.o
702
+ encoder.block.2.layer.1.DenseReluDense.wi_0
703
+ encoder.block.2.layer.1.DenseReluDense.wi_1
704
+ encoder.block.2.layer.1.DenseReluDense.wo
705
+ encoder.block.3.layer.0.SelfAttention.q
706
+ encoder.block.3.layer.0.SelfAttention.k
707
+ encoder.block.3.layer.0.SelfAttention.v
708
+ encoder.block.3.layer.0.SelfAttention.o
709
+ encoder.block.3.layer.1.DenseReluDense.wi_0
710
+ encoder.block.3.layer.1.DenseReluDense.wi_1
711
+ encoder.block.3.layer.1.DenseReluDense.wo
712
+ encoder.block.4.layer.0.SelfAttention.q
713
+ encoder.block.4.layer.0.SelfAttention.k
714
+ encoder.block.4.layer.0.SelfAttention.v
715
+ encoder.block.4.layer.0.SelfAttention.o
716
+ encoder.block.4.layer.1.DenseReluDense.wi_0
717
+ encoder.block.4.layer.1.DenseReluDense.wi_1
718
+ encoder.block.4.layer.1.DenseReluDense.wo
719
+ encoder.block.5.layer.0.SelfAttention.q
720
+ encoder.block.5.layer.0.SelfAttention.k
721
+ encoder.block.5.layer.0.SelfAttention.v
722
+ encoder.block.5.layer.0.SelfAttention.o
723
+ encoder.block.5.layer.1.DenseReluDense.wi_0
724
+ encoder.block.5.layer.1.DenseReluDense.wi_1
725
+ encoder.block.5.layer.1.DenseReluDense.wo
726
+ encoder.block.6.layer.0.SelfAttention.q
727
+ encoder.block.6.layer.0.SelfAttention.k
728
+ encoder.block.6.layer.0.SelfAttention.v
729
+ encoder.block.6.layer.0.SelfAttention.o
730
+ encoder.block.6.layer.1.DenseReluDense.wi_0
731
+ encoder.block.6.layer.1.DenseReluDense.wi_1
732
+ encoder.block.6.layer.1.DenseReluDense.wo
733
+ encoder.block.7.layer.0.SelfAttention.q
734
+ encoder.block.7.layer.0.SelfAttention.k
735
+ encoder.block.7.layer.0.SelfAttention.v
736
+ encoder.block.7.layer.0.SelfAttention.o
737
+ encoder.block.7.layer.1.DenseReluDense.wi_0
738
+ encoder.block.7.layer.1.DenseReluDense.wi_1
739
+ encoder.block.7.layer.1.DenseReluDense.wo
740
+ encoder.block.8.layer.0.SelfAttention.q
741
+ encoder.block.8.layer.0.SelfAttention.k
742
+ encoder.block.8.layer.0.SelfAttention.v
743
+ encoder.block.8.layer.0.SelfAttention.o
744
+ encoder.block.8.layer.1.DenseReluDense.wi_0
745
+ encoder.block.8.layer.1.DenseReluDense.wi_1
746
+ encoder.block.8.layer.1.DenseReluDense.wo
747
+ encoder.block.9.layer.0.SelfAttention.q
748
+ encoder.block.9.layer.0.SelfAttention.k
749
+ encoder.block.9.layer.0.SelfAttention.v
750
+ encoder.block.9.layer.0.SelfAttention.o
751
+ encoder.block.9.layer.1.DenseReluDense.wi_0
752
+ encoder.block.9.layer.1.DenseReluDense.wi_1
753
+ encoder.block.9.layer.1.DenseReluDense.wo
754
+ encoder.block.10.layer.0.SelfAttention.q
755
+ encoder.block.10.layer.0.SelfAttention.k
756
+ encoder.block.10.layer.0.SelfAttention.v
757
+ encoder.block.10.layer.0.SelfAttention.o
758
+ encoder.block.10.layer.1.DenseReluDense.wi_0
759
+ encoder.block.10.layer.1.DenseReluDense.wi_1
760
+ encoder.block.10.layer.1.DenseReluDense.wo
761
+ encoder.block.11.layer.0.SelfAttention.q
762
+ encoder.block.11.layer.0.SelfAttention.k
763
+ encoder.block.11.layer.0.SelfAttention.v
764
+ encoder.block.11.layer.0.SelfAttention.o
765
+ encoder.block.11.layer.1.DenseReluDense.wi_0
766
+ encoder.block.11.layer.1.DenseReluDense.wi_1
767
+ encoder.block.11.layer.1.DenseReluDense.wo
768
+ encoder.block.12.layer.0.SelfAttention.q
769
+ encoder.block.12.layer.0.SelfAttention.k
770
+ encoder.block.12.layer.0.SelfAttention.v
771
+ encoder.block.12.layer.0.SelfAttention.o
772
+ encoder.block.12.layer.1.DenseReluDense.wi_0
773
+ encoder.block.12.layer.1.DenseReluDense.wi_1
774
+ encoder.block.12.layer.1.DenseReluDense.wo
775
+ encoder.block.13.layer.0.SelfAttention.q
776
+ encoder.block.13.layer.0.SelfAttention.k
777
+ encoder.block.13.layer.0.SelfAttention.v
778
+ encoder.block.13.layer.0.SelfAttention.o
779
+ encoder.block.13.layer.1.DenseReluDense.wi_0
780
+ encoder.block.13.layer.1.DenseReluDense.wi_1
781
+ encoder.block.13.layer.1.DenseReluDense.wo
782
+ encoder.block.14.layer.0.SelfAttention.q
783
+ encoder.block.14.layer.0.SelfAttention.k
784
+ encoder.block.14.layer.0.SelfAttention.v
785
+ encoder.block.14.layer.0.SelfAttention.o
786
+ encoder.block.14.layer.1.DenseReluDense.wi_0
787
+ encoder.block.14.layer.1.DenseReluDense.wi_1
788
+ encoder.block.14.layer.1.DenseReluDense.wo
789
+ encoder.block.15.layer.0.SelfAttention.q
790
+ encoder.block.15.layer.0.SelfAttention.k
791
+ encoder.block.15.layer.0.SelfAttention.v
792
+ encoder.block.15.layer.0.SelfAttention.o
793
+ encoder.block.15.layer.1.DenseReluDense.wi_0
794
+ encoder.block.15.layer.1.DenseReluDense.wi_1
795
+ encoder.block.15.layer.1.DenseReluDense.wo
796
+ encoder.block.16.layer.0.SelfAttention.q
797
+ encoder.block.16.layer.0.SelfAttention.k
798
+ encoder.block.16.layer.0.SelfAttention.v
799
+ encoder.block.16.layer.0.SelfAttention.o
800
+ encoder.block.16.layer.1.DenseReluDense.wi_0
801
+ encoder.block.16.layer.1.DenseReluDense.wi_1
802
+ encoder.block.16.layer.1.DenseReluDense.wo
803
+ encoder.block.17.layer.0.SelfAttention.q
804
+ encoder.block.17.layer.0.SelfAttention.k
805
+ encoder.block.17.layer.0.SelfAttention.v
806
+ encoder.block.17.layer.0.SelfAttention.o
807
+ encoder.block.17.layer.1.DenseReluDense.wi_0
808
+ encoder.block.17.layer.1.DenseReluDense.wi_1
809
+ encoder.block.17.layer.1.DenseReluDense.wo
810
+ encoder.block.18.layer.0.SelfAttention.q
811
+ encoder.block.18.layer.0.SelfAttention.k
812
+ encoder.block.18.layer.0.SelfAttention.v
813
+ encoder.block.18.layer.0.SelfAttention.o
814
+ encoder.block.18.layer.1.DenseReluDense.wi_0
815
+ encoder.block.18.layer.1.DenseReluDense.wi_1
816
+ encoder.block.18.layer.1.DenseReluDense.wo
817
+ encoder.block.19.layer.0.SelfAttention.q
818
+ encoder.block.19.layer.0.SelfAttention.k
819
+ encoder.block.19.layer.0.SelfAttention.v
820
+ encoder.block.19.layer.0.SelfAttention.o
821
+ encoder.block.19.layer.1.DenseReluDense.wi_0
822
+ encoder.block.19.layer.1.DenseReluDense.wi_1
823
+ encoder.block.19.layer.1.DenseReluDense.wo
824
+ encoder.block.20.layer.0.SelfAttention.q
825
+ encoder.block.20.layer.0.SelfAttention.k
826
+ encoder.block.20.layer.0.SelfAttention.v
827
+ encoder.block.20.layer.0.SelfAttention.o
828
+ encoder.block.20.layer.1.DenseReluDense.wi_0
829
+ encoder.block.20.layer.1.DenseReluDense.wi_1
830
+ encoder.block.20.layer.1.DenseReluDense.wo
831
+ encoder.block.21.layer.0.SelfAttention.q
832
+ encoder.block.21.layer.0.SelfAttention.k
833
+ encoder.block.21.layer.0.SelfAttention.v
834
+ encoder.block.21.layer.0.SelfAttention.o
835
+ encoder.block.21.layer.1.DenseReluDense.wi_0
836
+ encoder.block.21.layer.1.DenseReluDense.wi_1
837
+ encoder.block.21.layer.1.DenseReluDense.wo
838
+ encoder.block.22.layer.0.SelfAttention.q
839
+ encoder.block.22.layer.0.SelfAttention.k
840
+ encoder.block.22.layer.0.SelfAttention.v
841
+ encoder.block.22.layer.0.SelfAttention.o
842
+ encoder.block.22.layer.1.DenseReluDense.wi_0
843
+ encoder.block.22.layer.1.DenseReluDense.wi_1
844
+ encoder.block.22.layer.1.DenseReluDense.wo
845
+ encoder.block.23.layer.0.SelfAttention.q
846
+ encoder.block.23.layer.0.SelfAttention.k
847
+ encoder.block.23.layer.0.SelfAttention.v
848
+ encoder.block.23.layer.0.SelfAttention.o
849
+ encoder.block.23.layer.1.DenseReluDense.wi_0
850
+ encoder.block.23.layer.1.DenseReluDense.wi_1
851
+ encoder.block.23.layer.1.DenseReluDense.wo
852
+ Done.
853
+