Upload 8 files
Browse files- workspace/T5-3B_RUN2.txt +738 -0
- workspace/flan-ts-base.txt +437 -0
- workspace/flan-ts-large.txt +857 -0
- workspace/flan-ts-small.txt +298 -0
- workspace/flan-ts-xl.txt +0 -0
- workspace/flan-ts-xxl.txt +0 -0
- workspace/ts_large.txt +743 -0
- workspace/ts_xxl_record.txt +853 -0
workspace/T5-3B_RUN2.txt
ADDED
|
@@ -0,0 +1,738 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CUDA extension not installed.
|
| 2 |
+
Some weights of the model checkpoint at t5-3b were not used when initializing T5EncoderModel: ['decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.2.DenseReluDense.wi.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.20.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.15.layer.2.DenseReluDense.wi.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.2.DenseReluDense.wi.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.14.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.2.DenseReluDense.wi.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.2.DenseReluDense.wi.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wi.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.11.layer.2.DenseReluDense.wi.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.2.DenseReluDense.wi.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.2.DenseReluDense.wi.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.12.layer.2.DenseReluDense.wi.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.final_layer_norm.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.2.layer.2.DenseReluDense.wi.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.2.DenseReluDense.wi.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.4.layer.2.DenseReluDense.wi.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight']
|
| 3 |
+
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
| 4 |
+
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
| 5 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 6 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 7 |
+
/usr/local/lib/python3.10/dist-packages/transformers/models/t5/tokenization_t5.py:163: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
|
| 8 |
+
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
|
| 9 |
+
- Be aware that you SHOULD NOT rely on t5-3b automatically truncating your input to 512 when padding/encoding.
|
| 10 |
+
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
|
| 11 |
+
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
|
| 12 |
+
warnings.warn(
|
| 13 |
+
Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
|
| 14 |
+
Starting ...
|
| 15 |
+
Ready.
|
| 16 |
+
0 layer.0.SelfAttention.q
|
| 17 |
+
Quantizing ...
|
| 18 |
+
time 1.35
|
| 19 |
+
error 379.2095031738281
|
| 20 |
+
0 layer.0.SelfAttention.k
|
| 21 |
+
Quantizing ...
|
| 22 |
+
time 0.24
|
| 23 |
+
error 46126.375
|
| 24 |
+
0 layer.0.SelfAttention.v
|
| 25 |
+
Quantizing ...
|
| 26 |
+
time 0.24
|
| 27 |
+
error 24450.14453125
|
| 28 |
+
0 layer.0.SelfAttention.o
|
| 29 |
+
Quantizing ...
|
| 30 |
+
time 1.00
|
| 31 |
+
error 44522.7578125
|
| 32 |
+
0 layer.1.DenseReluDense.wi
|
| 33 |
+
Quantizing ...
|
| 34 |
+
time 0.26
|
| 35 |
+
error 709531.9375
|
| 36 |
+
0 layer.1.DenseReluDense.wo
|
| 37 |
+
Quantizing ...
|
| 38 |
+
time 4.78
|
| 39 |
+
error 32526.62109375
|
| 40 |
+
1 layer.0.SelfAttention.q
|
| 41 |
+
Quantizing ...
|
| 42 |
+
time 1.39
|
| 43 |
+
error 142.10263061523438
|
| 44 |
+
1 layer.0.SelfAttention.k
|
| 45 |
+
Quantizing ...
|
| 46 |
+
time 0.24
|
| 47 |
+
error 14705.056640625
|
| 48 |
+
1 layer.0.SelfAttention.v
|
| 49 |
+
Quantizing ...
|
| 50 |
+
time 0.25
|
| 51 |
+
error 7253.73046875
|
| 52 |
+
1 layer.0.SelfAttention.o
|
| 53 |
+
Quantizing ...
|
| 54 |
+
time 1.00
|
| 55 |
+
error 4656.6787109375
|
| 56 |
+
1 layer.1.DenseReluDense.wi
|
| 57 |
+
Quantizing ...
|
| 58 |
+
time 0.25
|
| 59 |
+
error 817805.625
|
| 60 |
+
1 layer.1.DenseReluDense.wo
|
| 61 |
+
Quantizing ...
|
| 62 |
+
time 4.83
|
| 63 |
+
error 102973.921875
|
| 64 |
+
2 layer.0.SelfAttention.q
|
| 65 |
+
Quantizing ...
|
| 66 |
+
time 1.39
|
| 67 |
+
error 201.76727294921875
|
| 68 |
+
2 layer.0.SelfAttention.k
|
| 69 |
+
Quantizing ...
|
| 70 |
+
time 0.24
|
| 71 |
+
error 22729.0390625
|
| 72 |
+
2 layer.0.SelfAttention.v
|
| 73 |
+
Quantizing ...
|
| 74 |
+
time 0.24
|
| 75 |
+
error 11655.05078125
|
| 76 |
+
2 layer.0.SelfAttention.o
|
| 77 |
+
Quantizing ...
|
| 78 |
+
time 1.01
|
| 79 |
+
error 7398.17529296875
|
| 80 |
+
2 layer.1.DenseReluDense.wi
|
| 81 |
+
Quantizing ...
|
| 82 |
+
time 0.27
|
| 83 |
+
error 1344222.25
|
| 84 |
+
2 layer.1.DenseReluDense.wo
|
| 85 |
+
Quantizing ...
|
| 86 |
+
time 4.88
|
| 87 |
+
error 120313.578125
|
| 88 |
+
3 layer.0.SelfAttention.q
|
| 89 |
+
Quantizing ...
|
| 90 |
+
time 1.40
|
| 91 |
+
error 240.1311492919922
|
| 92 |
+
3 layer.0.SelfAttention.k
|
| 93 |
+
Quantizing ...
|
| 94 |
+
time 0.24
|
| 95 |
+
error 24427.28515625
|
| 96 |
+
3 layer.0.SelfAttention.v
|
| 97 |
+
Quantizing ...
|
| 98 |
+
time 0.26
|
| 99 |
+
error 12910.115234375
|
| 100 |
+
3 layer.0.SelfAttention.o
|
| 101 |
+
Quantizing ...
|
| 102 |
+
time 1.00
|
| 103 |
+
error 9308.4921875
|
| 104 |
+
3 layer.1.DenseReluDense.wi
|
| 105 |
+
Quantizing ...
|
| 106 |
+
time 0.25
|
| 107 |
+
error 2360071.5
|
| 108 |
+
3 layer.1.DenseReluDense.wo
|
| 109 |
+
Quantizing ...
|
| 110 |
+
time 4.77
|
| 111 |
+
error 143022.65625
|
| 112 |
+
4 layer.0.SelfAttention.q
|
| 113 |
+
Quantizing ...
|
| 114 |
+
time 1.40
|
| 115 |
+
error 282.9701232910156
|
| 116 |
+
4 layer.0.SelfAttention.k
|
| 117 |
+
Quantizing ...
|
| 118 |
+
time 0.26
|
| 119 |
+
error 33365.83203125
|
| 120 |
+
4 layer.0.SelfAttention.v
|
| 121 |
+
Quantizing ...
|
| 122 |
+
time 0.25
|
| 123 |
+
error 16407.8203125
|
| 124 |
+
4 layer.0.SelfAttention.o
|
| 125 |
+
Quantizing ...
|
| 126 |
+
time 1.01
|
| 127 |
+
error 15296.896484375
|
| 128 |
+
4 layer.1.DenseReluDense.wi
|
| 129 |
+
Quantizing ...
|
| 130 |
+
time 0.27
|
| 131 |
+
error 3803509.5
|
| 132 |
+
4 layer.1.DenseReluDense.wo
|
| 133 |
+
Quantizing ...
|
| 134 |
+
time 4.74
|
| 135 |
+
error 156256.96875
|
| 136 |
+
5 layer.0.SelfAttention.q
|
| 137 |
+
Quantizing ...
|
| 138 |
+
time 1.40
|
| 139 |
+
error 304.42095947265625
|
| 140 |
+
5 layer.0.SelfAttention.k
|
| 141 |
+
Quantizing ...
|
| 142 |
+
time 0.25
|
| 143 |
+
error 31203.0546875
|
| 144 |
+
5 layer.0.SelfAttention.v
|
| 145 |
+
Quantizing ...
|
| 146 |
+
time 0.26
|
| 147 |
+
error 17006.921875
|
| 148 |
+
5 layer.0.SelfAttention.o
|
| 149 |
+
Quantizing ...
|
| 150 |
+
time 1.02
|
| 151 |
+
error 14054.951171875
|
| 152 |
+
5 layer.1.DenseReluDense.wi
|
| 153 |
+
Quantizing ...
|
| 154 |
+
time 0.25
|
| 155 |
+
error 4938547.0
|
| 156 |
+
5 layer.1.DenseReluDense.wo
|
| 157 |
+
Quantizing ...
|
| 158 |
+
time 4.70
|
| 159 |
+
error 183307.234375
|
| 160 |
+
6 layer.0.SelfAttention.q
|
| 161 |
+
Quantizing ...
|
| 162 |
+
time 1.42
|
| 163 |
+
error 299.11724853515625
|
| 164 |
+
6 layer.0.SelfAttention.k
|
| 165 |
+
Quantizing ...
|
| 166 |
+
time 0.24
|
| 167 |
+
error 35865.5703125
|
| 168 |
+
6 layer.0.SelfAttention.v
|
| 169 |
+
Quantizing ...
|
| 170 |
+
time 0.24
|
| 171 |
+
error 17129.06640625
|
| 172 |
+
6 layer.0.SelfAttention.o
|
| 173 |
+
Quantizing ...
|
| 174 |
+
time 1.02
|
| 175 |
+
error 12793.3740234375
|
| 176 |
+
6 layer.1.DenseReluDense.wi
|
| 177 |
+
Quantizing ...
|
| 178 |
+
time 0.26
|
| 179 |
+
error 7528978.5
|
| 180 |
+
6 layer.1.DenseReluDense.wo
|
| 181 |
+
Quantizing ...
|
| 182 |
+
time 4.70
|
| 183 |
+
error 201923.0625
|
| 184 |
+
7 layer.0.SelfAttention.q
|
| 185 |
+
Quantizing ...
|
| 186 |
+
time 1.39
|
| 187 |
+
error 368.124755859375
|
| 188 |
+
7 layer.0.SelfAttention.k
|
| 189 |
+
Quantizing ...
|
| 190 |
+
time 0.27
|
| 191 |
+
error 44324.9453125
|
| 192 |
+
7 layer.0.SelfAttention.v
|
| 193 |
+
Quantizing ...
|
| 194 |
+
time 0.25
|
| 195 |
+
error 21733.6484375
|
| 196 |
+
7 layer.0.SelfAttention.o
|
| 197 |
+
Quantizing ...
|
| 198 |
+
time 0.99
|
| 199 |
+
error 25086.8125
|
| 200 |
+
7 layer.1.DenseReluDense.wi
|
| 201 |
+
Quantizing ...
|
| 202 |
+
time 0.25
|
| 203 |
+
error 9442284.0
|
| 204 |
+
7 layer.1.DenseReluDense.wo
|
| 205 |
+
Quantizing ...
|
| 206 |
+
time 4.66
|
| 207 |
+
error 231078.28125
|
| 208 |
+
8 layer.0.SelfAttention.q
|
| 209 |
+
Quantizing ...
|
| 210 |
+
time 1.40
|
| 211 |
+
error 336.513671875
|
| 212 |
+
8 layer.0.SelfAttention.k
|
| 213 |
+
Quantizing ...
|
| 214 |
+
time 0.25
|
| 215 |
+
error 40786.26171875
|
| 216 |
+
8 layer.0.SelfAttention.v
|
| 217 |
+
Quantizing ...
|
| 218 |
+
time 0.24
|
| 219 |
+
error 22459.0078125
|
| 220 |
+
8 layer.0.SelfAttention.o
|
| 221 |
+
Quantizing ...
|
| 222 |
+
time 0.99
|
| 223 |
+
error 22684.369140625
|
| 224 |
+
8 layer.1.DenseReluDense.wi
|
| 225 |
+
Quantizing ...
|
| 226 |
+
time 0.25
|
| 227 |
+
error 11038062.0
|
| 228 |
+
8 layer.1.DenseReluDense.wo
|
| 229 |
+
Quantizing ...
|
| 230 |
+
time 4.66
|
| 231 |
+
error 358261.84375
|
| 232 |
+
9 layer.0.SelfAttention.q
|
| 233 |
+
Quantizing ...
|
| 234 |
+
time 1.39
|
| 235 |
+
error 356.87689208984375
|
| 236 |
+
9 layer.0.SelfAttention.k
|
| 237 |
+
Quantizing ...
|
| 238 |
+
time 0.24
|
| 239 |
+
error 43993.4375
|
| 240 |
+
9 layer.0.SelfAttention.v
|
| 241 |
+
Quantizing ...
|
| 242 |
+
time 0.24
|
| 243 |
+
error 26483.703125
|
| 244 |
+
9 layer.0.SelfAttention.o
|
| 245 |
+
Quantizing ...
|
| 246 |
+
time 0.98
|
| 247 |
+
error 68000.96875
|
| 248 |
+
9 layer.1.DenseReluDense.wi
|
| 249 |
+
Quantizing ...
|
| 250 |
+
time 0.25
|
| 251 |
+
error 12831236.0
|
| 252 |
+
9 layer.1.DenseReluDense.wo
|
| 253 |
+
Quantizing ...
|
| 254 |
+
time 4.67
|
| 255 |
+
error 329604.78125
|
| 256 |
+
10 layer.0.SelfAttention.q
|
| 257 |
+
Quantizing ...
|
| 258 |
+
time 1.39
|
| 259 |
+
error 360.63385009765625
|
| 260 |
+
10 layer.0.SelfAttention.k
|
| 261 |
+
Quantizing ...
|
| 262 |
+
time 0.24
|
| 263 |
+
error 44677.66015625
|
| 264 |
+
10 layer.0.SelfAttention.v
|
| 265 |
+
Quantizing ...
|
| 266 |
+
time 0.24
|
| 267 |
+
error 28456.794921875
|
| 268 |
+
10 layer.0.SelfAttention.o
|
| 269 |
+
Quantizing ...
|
| 270 |
+
time 0.99
|
| 271 |
+
error 66670.6953125
|
| 272 |
+
10 layer.1.DenseReluDense.wi
|
| 273 |
+
Quantizing ...
|
| 274 |
+
time 0.25
|
| 275 |
+
error 14097091.0
|
| 276 |
+
10 layer.1.DenseReluDense.wo
|
| 277 |
+
Quantizing ...
|
| 278 |
+
time 4.68
|
| 279 |
+
error 396505.9375
|
| 280 |
+
11 layer.0.SelfAttention.q
|
| 281 |
+
Quantizing ...
|
| 282 |
+
time 1.39
|
| 283 |
+
error 353.4673767089844
|
| 284 |
+
11 layer.0.SelfAttention.k
|
| 285 |
+
Quantizing ...
|
| 286 |
+
time 0.24
|
| 287 |
+
error 42337.890625
|
| 288 |
+
11 layer.0.SelfAttention.v
|
| 289 |
+
Quantizing ...
|
| 290 |
+
time 0.24
|
| 291 |
+
error 41291.625
|
| 292 |
+
11 layer.0.SelfAttention.o
|
| 293 |
+
Quantizing ...
|
| 294 |
+
time 0.99
|
| 295 |
+
error 84161.796875
|
| 296 |
+
11 layer.1.DenseReluDense.wi
|
| 297 |
+
Quantizing ...
|
| 298 |
+
time 0.25
|
| 299 |
+
error 13223532.0
|
| 300 |
+
11 layer.1.DenseReluDense.wo
|
| 301 |
+
Quantizing ...
|
| 302 |
+
time 4.70
|
| 303 |
+
error 527305.5625
|
| 304 |
+
12 layer.0.SelfAttention.q
|
| 305 |
+
Quantizing ...
|
| 306 |
+
time 1.39
|
| 307 |
+
error 352.1868896484375
|
| 308 |
+
12 layer.0.SelfAttention.k
|
| 309 |
+
Quantizing ...
|
| 310 |
+
time 0.24
|
| 311 |
+
error 45228.03515625
|
| 312 |
+
12 layer.0.SelfAttention.v
|
| 313 |
+
Quantizing ...
|
| 314 |
+
time 0.24
|
| 315 |
+
error 49482.1328125
|
| 316 |
+
12 layer.0.SelfAttention.o
|
| 317 |
+
Quantizing ...
|
| 318 |
+
time 0.98
|
| 319 |
+
error 166233.6875
|
| 320 |
+
12 layer.1.DenseReluDense.wi
|
| 321 |
+
Quantizing ...
|
| 322 |
+
time 0.25
|
| 323 |
+
error 12493772.0
|
| 324 |
+
12 layer.1.DenseReluDense.wo
|
| 325 |
+
Quantizing ...
|
| 326 |
+
time 4.69
|
| 327 |
+
error 702293.9375
|
| 328 |
+
13 layer.0.SelfAttention.q
|
| 329 |
+
Quantizing ...
|
| 330 |
+
time 1.39
|
| 331 |
+
error 334.15252685546875
|
| 332 |
+
13 layer.0.SelfAttention.k
|
| 333 |
+
Quantizing ...
|
| 334 |
+
time 0.24
|
| 335 |
+
error 43450.84765625
|
| 336 |
+
13 layer.0.SelfAttention.v
|
| 337 |
+
Quantizing ...
|
| 338 |
+
time 0.24
|
| 339 |
+
error 60685.6875
|
| 340 |
+
13 layer.0.SelfAttention.o
|
| 341 |
+
Quantizing ...
|
| 342 |
+
time 0.98
|
| 343 |
+
error 237831.390625
|
| 344 |
+
13 layer.1.DenseReluDense.wi
|
| 345 |
+
Quantizing ...
|
| 346 |
+
time 0.25
|
| 347 |
+
error 17085658.0
|
| 348 |
+
13 layer.1.DenseReluDense.wo
|
| 349 |
+
Quantizing ...
|
| 350 |
+
time 4.77
|
| 351 |
+
error 1149340.5
|
| 352 |
+
14 layer.0.SelfAttention.q
|
| 353 |
+
Quantizing ...
|
| 354 |
+
time 1.39
|
| 355 |
+
error 307.14837646484375
|
| 356 |
+
14 layer.0.SelfAttention.k
|
| 357 |
+
Quantizing ...
|
| 358 |
+
time 0.24
|
| 359 |
+
error 37913.44140625
|
| 360 |
+
14 layer.0.SelfAttention.v
|
| 361 |
+
Quantizing ...
|
| 362 |
+
time 0.24
|
| 363 |
+
error 70616.703125
|
| 364 |
+
14 layer.0.SelfAttention.o
|
| 365 |
+
Quantizing ...
|
| 366 |
+
time 0.98
|
| 367 |
+
error 276008.25
|
| 368 |
+
14 layer.1.DenseReluDense.wi
|
| 369 |
+
Quantizing ...
|
| 370 |
+
time 0.25
|
| 371 |
+
error 18912372.0
|
| 372 |
+
14 layer.1.DenseReluDense.wo
|
| 373 |
+
Quantizing ...
|
| 374 |
+
time 4.81
|
| 375 |
+
error 1235969.25
|
| 376 |
+
15 layer.0.SelfAttention.q
|
| 377 |
+
Quantizing ...
|
| 378 |
+
time 1.39
|
| 379 |
+
error 248.17747497558594
|
| 380 |
+
15 layer.0.SelfAttention.k
|
| 381 |
+
Quantizing ...
|
| 382 |
+
time 0.24
|
| 383 |
+
error 38016.640625
|
| 384 |
+
15 layer.0.SelfAttention.v
|
| 385 |
+
Quantizing ...
|
| 386 |
+
time 0.24
|
| 387 |
+
error 91188.5
|
| 388 |
+
15 layer.0.SelfAttention.o
|
| 389 |
+
Quantizing ...
|
| 390 |
+
time 1.00
|
| 391 |
+
error 444728.9375
|
| 392 |
+
15 layer.1.DenseReluDense.wi
|
| 393 |
+
Quantizing ...
|
| 394 |
+
time 0.25
|
| 395 |
+
error 25090036.0
|
| 396 |
+
15 layer.1.DenseReluDense.wo
|
| 397 |
+
Quantizing ...
|
| 398 |
+
time 4.78
|
| 399 |
+
error 2290796.5
|
| 400 |
+
16 layer.0.SelfAttention.q
|
| 401 |
+
Quantizing ...
|
| 402 |
+
time 1.41
|
| 403 |
+
error 292.78265380859375
|
| 404 |
+
16 layer.0.SelfAttention.k
|
| 405 |
+
Quantizing ...
|
| 406 |
+
time 0.24
|
| 407 |
+
error 37744.9765625
|
| 408 |
+
16 layer.0.SelfAttention.v
|
| 409 |
+
Quantizing ...
|
| 410 |
+
time 0.27
|
| 411 |
+
error 111741.5625
|
| 412 |
+
16 layer.0.SelfAttention.o
|
| 413 |
+
Quantizing ...
|
| 414 |
+
time 1.02
|
| 415 |
+
error 623461.5625
|
| 416 |
+
16 layer.1.DenseReluDense.wi
|
| 417 |
+
Quantizing ...
|
| 418 |
+
time 0.25
|
| 419 |
+
error 32498636.0
|
| 420 |
+
16 layer.1.DenseReluDense.wo
|
| 421 |
+
Quantizing ...
|
| 422 |
+
time 4.78
|
| 423 |
+
error 2876735.0
|
| 424 |
+
17 layer.0.SelfAttention.q
|
| 425 |
+
Quantizing ...
|
| 426 |
+
time 1.42
|
| 427 |
+
error 238.4019775390625
|
| 428 |
+
17 layer.0.SelfAttention.k
|
| 429 |
+
Quantizing ...
|
| 430 |
+
time 0.25
|
| 431 |
+
error 36026.7890625
|
| 432 |
+
17 layer.0.SelfAttention.v
|
| 433 |
+
Quantizing ...
|
| 434 |
+
time 0.25
|
| 435 |
+
error 133311.40625
|
| 436 |
+
17 layer.0.SelfAttention.o
|
| 437 |
+
Quantizing ...
|
| 438 |
+
time 1.02
|
| 439 |
+
error 775721.0
|
| 440 |
+
17 layer.1.DenseReluDense.wi
|
| 441 |
+
Quantizing ...
|
| 442 |
+
time 0.26
|
| 443 |
+
error 29635048.0
|
| 444 |
+
17 layer.1.DenseReluDense.wo
|
| 445 |
+
Quantizing ...
|
| 446 |
+
time 4.72
|
| 447 |
+
error 4939297.0
|
| 448 |
+
18 layer.0.SelfAttention.q
|
| 449 |
+
Quantizing ...
|
| 450 |
+
time 1.40
|
| 451 |
+
error 264.264892578125
|
| 452 |
+
18 layer.0.SelfAttention.k
|
| 453 |
+
Quantizing ...
|
| 454 |
+
time 0.25
|
| 455 |
+
error 35441.94140625
|
| 456 |
+
18 layer.0.SelfAttention.v
|
| 457 |
+
Quantizing ...
|
| 458 |
+
time 0.28
|
| 459 |
+
error 173245.75
|
| 460 |
+
18 layer.0.SelfAttention.o
|
| 461 |
+
Quantizing ...
|
| 462 |
+
time 1.03
|
| 463 |
+
error 1960626.25
|
| 464 |
+
18 layer.1.DenseReluDense.wi
|
| 465 |
+
Quantizing ...
|
| 466 |
+
time 0.25
|
| 467 |
+
error 35718256.0
|
| 468 |
+
18 layer.1.DenseReluDense.wo
|
| 469 |
+
Quantizing ...
|
| 470 |
+
time 4.69
|
| 471 |
+
error 8303653.0
|
| 472 |
+
19 layer.0.SelfAttention.q
|
| 473 |
+
Quantizing ...
|
| 474 |
+
time 1.41
|
| 475 |
+
error 208.0140380859375
|
| 476 |
+
19 layer.0.SelfAttention.k
|
| 477 |
+
Quantizing ...
|
| 478 |
+
time 0.24
|
| 479 |
+
error 29667.7890625
|
| 480 |
+
19 layer.0.SelfAttention.v
|
| 481 |
+
Quantizing ...
|
| 482 |
+
time 0.25
|
| 483 |
+
error 186044.875
|
| 484 |
+
19 layer.0.SelfAttention.o
|
| 485 |
+
Quantizing ...
|
| 486 |
+
time 1.04
|
| 487 |
+
error 1691559.75
|
| 488 |
+
19 layer.1.DenseReluDense.wi
|
| 489 |
+
Quantizing ...
|
| 490 |
+
time 0.25
|
| 491 |
+
error 35222308.0
|
| 492 |
+
19 layer.1.DenseReluDense.wo
|
| 493 |
+
Quantizing ...
|
| 494 |
+
time 4.69
|
| 495 |
+
error 7108630.0
|
| 496 |
+
20 layer.0.SelfAttention.q
|
| 497 |
+
Quantizing ...
|
| 498 |
+
time 1.41
|
| 499 |
+
error 153.36215209960938
|
| 500 |
+
20 layer.0.SelfAttention.k
|
| 501 |
+
Quantizing ...
|
| 502 |
+
time 0.26
|
| 503 |
+
error 22485.923828125
|
| 504 |
+
20 layer.0.SelfAttention.v
|
| 505 |
+
Quantizing ...
|
| 506 |
+
time 0.24
|
| 507 |
+
error 193863.65625
|
| 508 |
+
20 layer.0.SelfAttention.o
|
| 509 |
+
Quantizing ...
|
| 510 |
+
time 0.98
|
| 511 |
+
error 2213693.75
|
| 512 |
+
20 layer.1.DenseReluDense.wi
|
| 513 |
+
Quantizing ...
|
| 514 |
+
time 0.25
|
| 515 |
+
error 44203168.0
|
| 516 |
+
20 layer.1.DenseReluDense.wo
|
| 517 |
+
Quantizing ...
|
| 518 |
+
time 4.66
|
| 519 |
+
error 9345712.0
|
| 520 |
+
21 layer.0.SelfAttention.q
|
| 521 |
+
Quantizing ...
|
| 522 |
+
time 1.39
|
| 523 |
+
error 179.65872192382812
|
| 524 |
+
21 layer.0.SelfAttention.k
|
| 525 |
+
Quantizing ...
|
| 526 |
+
time 0.24
|
| 527 |
+
error 23743.3984375
|
| 528 |
+
21 layer.0.SelfAttention.v
|
| 529 |
+
Quantizing ...
|
| 530 |
+
time 0.24
|
| 531 |
+
error 237300.96875
|
| 532 |
+
21 layer.0.SelfAttention.o
|
| 533 |
+
Quantizing ...
|
| 534 |
+
time 0.99
|
| 535 |
+
error 3179711.0
|
| 536 |
+
21 layer.1.DenseReluDense.wi
|
| 537 |
+
Quantizing ...
|
| 538 |
+
time 0.25
|
| 539 |
+
error 66251440.0
|
| 540 |
+
21 layer.1.DenseReluDense.wo
|
| 541 |
+
Quantizing ...
|
| 542 |
+
time 4.69
|
| 543 |
+
error 30768120.0
|
| 544 |
+
22 layer.0.SelfAttention.q
|
| 545 |
+
Quantizing ...
|
| 546 |
+
time 1.39
|
| 547 |
+
error 73.71006774902344
|
| 548 |
+
22 layer.0.SelfAttention.k
|
| 549 |
+
Quantizing ...
|
| 550 |
+
time 0.24
|
| 551 |
+
error 10168.076171875
|
| 552 |
+
22 layer.0.SelfAttention.v
|
| 553 |
+
Quantizing ...
|
| 554 |
+
time 0.24
|
| 555 |
+
error 131254.0
|
| 556 |
+
22 layer.0.SelfAttention.o
|
| 557 |
+
Quantizing ...
|
| 558 |
+
time 0.99
|
| 559 |
+
error 1327100.625
|
| 560 |
+
22 layer.1.DenseReluDense.wi
|
| 561 |
+
Quantizing ...
|
| 562 |
+
time 0.25
|
| 563 |
+
error 40279020.0
|
| 564 |
+
22 layer.1.DenseReluDense.wo
|
| 565 |
+
Quantizing ...
|
| 566 |
+
time 4.67
|
| 567 |
+
error 90908576.0
|
| 568 |
+
23 layer.0.SelfAttention.q
|
| 569 |
+
Quantizing ...
|
| 570 |
+
time 1.40
|
| 571 |
+
error 84.36131286621094
|
| 572 |
+
23 layer.0.SelfAttention.k
|
| 573 |
+
Quantizing ...
|
| 574 |
+
time 0.24
|
| 575 |
+
error 11834.87109375
|
| 576 |
+
23 layer.0.SelfAttention.v
|
| 577 |
+
Quantizing ...
|
| 578 |
+
time 0.24
|
| 579 |
+
error 154102.96875
|
| 580 |
+
23 layer.0.SelfAttention.o
|
| 581 |
+
Quantizing ...
|
| 582 |
+
time 1.00
|
| 583 |
+
error 2506505.5
|
| 584 |
+
23 layer.1.DenseReluDense.wi
|
| 585 |
+
Quantizing ...
|
| 586 |
+
time 0.25
|
| 587 |
+
error 23018948.0
|
| 588 |
+
23 layer.1.DenseReluDense.wo
|
| 589 |
+
Quantizing ...
|
| 590 |
+
time 4.65
|
| 591 |
+
error 38312456.0
|
| 592 |
+
443.7025671005249
|
| 593 |
+
Packing ...
|
| 594 |
+
encoder.block.0.layer.0.SelfAttention.q
|
| 595 |
+
encoder.block.0.layer.0.SelfAttention.k
|
| 596 |
+
encoder.block.0.layer.0.SelfAttention.v
|
| 597 |
+
encoder.block.0.layer.0.SelfAttention.o
|
| 598 |
+
encoder.block.0.layer.1.DenseReluDense.wi
|
| 599 |
+
encoder.block.0.layer.1.DenseReluDense.wo
|
| 600 |
+
encoder.block.1.layer.0.SelfAttention.q
|
| 601 |
+
encoder.block.1.layer.0.SelfAttention.k
|
| 602 |
+
encoder.block.1.layer.0.SelfAttention.v
|
| 603 |
+
encoder.block.1.layer.0.SelfAttention.o
|
| 604 |
+
encoder.block.1.layer.1.DenseReluDense.wi
|
| 605 |
+
encoder.block.1.layer.1.DenseReluDense.wo
|
| 606 |
+
encoder.block.2.layer.0.SelfAttention.q
|
| 607 |
+
encoder.block.2.layer.0.SelfAttention.k
|
| 608 |
+
encoder.block.2.layer.0.SelfAttention.v
|
| 609 |
+
encoder.block.2.layer.0.SelfAttention.o
|
| 610 |
+
encoder.block.2.layer.1.DenseReluDense.wi
|
| 611 |
+
encoder.block.2.layer.1.DenseReluDense.wo
|
| 612 |
+
encoder.block.3.layer.0.SelfAttention.q
|
| 613 |
+
encoder.block.3.layer.0.SelfAttention.k
|
| 614 |
+
encoder.block.3.layer.0.SelfAttention.v
|
| 615 |
+
encoder.block.3.layer.0.SelfAttention.o
|
| 616 |
+
encoder.block.3.layer.1.DenseReluDense.wi
|
| 617 |
+
encoder.block.3.layer.1.DenseReluDense.wo
|
| 618 |
+
encoder.block.4.layer.0.SelfAttention.q
|
| 619 |
+
encoder.block.4.layer.0.SelfAttention.k
|
| 620 |
+
encoder.block.4.layer.0.SelfAttention.v
|
| 621 |
+
encoder.block.4.layer.0.SelfAttention.o
|
| 622 |
+
encoder.block.4.layer.1.DenseReluDense.wi
|
| 623 |
+
encoder.block.4.layer.1.DenseReluDense.wo
|
| 624 |
+
encoder.block.5.layer.0.SelfAttention.q
|
| 625 |
+
encoder.block.5.layer.0.SelfAttention.k
|
| 626 |
+
encoder.block.5.layer.0.SelfAttention.v
|
| 627 |
+
encoder.block.5.layer.0.SelfAttention.o
|
| 628 |
+
encoder.block.5.layer.1.DenseReluDense.wi
|
| 629 |
+
encoder.block.5.layer.1.DenseReluDense.wo
|
| 630 |
+
encoder.block.6.layer.0.SelfAttention.q
|
| 631 |
+
encoder.block.6.layer.0.SelfAttention.k
|
| 632 |
+
encoder.block.6.layer.0.SelfAttention.v
|
| 633 |
+
encoder.block.6.layer.0.SelfAttention.o
|
| 634 |
+
encoder.block.6.layer.1.DenseReluDense.wi
|
| 635 |
+
encoder.block.6.layer.1.DenseReluDense.wo
|
| 636 |
+
encoder.block.7.layer.0.SelfAttention.q
|
| 637 |
+
encoder.block.7.layer.0.SelfAttention.k
|
| 638 |
+
encoder.block.7.layer.0.SelfAttention.v
|
| 639 |
+
encoder.block.7.layer.0.SelfAttention.o
|
| 640 |
+
encoder.block.7.layer.1.DenseReluDense.wi
|
| 641 |
+
encoder.block.7.layer.1.DenseReluDense.wo
|
| 642 |
+
encoder.block.8.layer.0.SelfAttention.q
|
| 643 |
+
encoder.block.8.layer.0.SelfAttention.k
|
| 644 |
+
encoder.block.8.layer.0.SelfAttention.v
|
| 645 |
+
encoder.block.8.layer.0.SelfAttention.o
|
| 646 |
+
encoder.block.8.layer.1.DenseReluDense.wi
|
| 647 |
+
encoder.block.8.layer.1.DenseReluDense.wo
|
| 648 |
+
encoder.block.9.layer.0.SelfAttention.q
|
| 649 |
+
encoder.block.9.layer.0.SelfAttention.k
|
| 650 |
+
encoder.block.9.layer.0.SelfAttention.v
|
| 651 |
+
encoder.block.9.layer.0.SelfAttention.o
|
| 652 |
+
encoder.block.9.layer.1.DenseReluDense.wi
|
| 653 |
+
encoder.block.9.layer.1.DenseReluDense.wo
|
| 654 |
+
encoder.block.10.layer.0.SelfAttention.q
|
| 655 |
+
encoder.block.10.layer.0.SelfAttention.k
|
| 656 |
+
encoder.block.10.layer.0.SelfAttention.v
|
| 657 |
+
encoder.block.10.layer.0.SelfAttention.o
|
| 658 |
+
encoder.block.10.layer.1.DenseReluDense.wi
|
| 659 |
+
encoder.block.10.layer.1.DenseReluDense.wo
|
| 660 |
+
encoder.block.11.layer.0.SelfAttention.q
|
| 661 |
+
encoder.block.11.layer.0.SelfAttention.k
|
| 662 |
+
encoder.block.11.layer.0.SelfAttention.v
|
| 663 |
+
encoder.block.11.layer.0.SelfAttention.o
|
| 664 |
+
encoder.block.11.layer.1.DenseReluDense.wi
|
| 665 |
+
encoder.block.11.layer.1.DenseReluDense.wo
|
| 666 |
+
encoder.block.12.layer.0.SelfAttention.q
|
| 667 |
+
encoder.block.12.layer.0.SelfAttention.k
|
| 668 |
+
encoder.block.12.layer.0.SelfAttention.v
|
| 669 |
+
encoder.block.12.layer.0.SelfAttention.o
|
| 670 |
+
encoder.block.12.layer.1.DenseReluDense.wi
|
| 671 |
+
encoder.block.12.layer.1.DenseReluDense.wo
|
| 672 |
+
encoder.block.13.layer.0.SelfAttention.q
|
| 673 |
+
encoder.block.13.layer.0.SelfAttention.k
|
| 674 |
+
encoder.block.13.layer.0.SelfAttention.v
|
| 675 |
+
encoder.block.13.layer.0.SelfAttention.o
|
| 676 |
+
encoder.block.13.layer.1.DenseReluDense.wi
|
| 677 |
+
encoder.block.13.layer.1.DenseReluDense.wo
|
| 678 |
+
encoder.block.14.layer.0.SelfAttention.q
|
| 679 |
+
encoder.block.14.layer.0.SelfAttention.k
|
| 680 |
+
encoder.block.14.layer.0.SelfAttention.v
|
| 681 |
+
encoder.block.14.layer.0.SelfAttention.o
|
| 682 |
+
encoder.block.14.layer.1.DenseReluDense.wi
|
| 683 |
+
encoder.block.14.layer.1.DenseReluDense.wo
|
| 684 |
+
encoder.block.15.layer.0.SelfAttention.q
|
| 685 |
+
encoder.block.15.layer.0.SelfAttention.k
|
| 686 |
+
encoder.block.15.layer.0.SelfAttention.v
|
| 687 |
+
encoder.block.15.layer.0.SelfAttention.o
|
| 688 |
+
encoder.block.15.layer.1.DenseReluDense.wi
|
| 689 |
+
encoder.block.15.layer.1.DenseReluDense.wo
|
| 690 |
+
encoder.block.16.layer.0.SelfAttention.q
|
| 691 |
+
encoder.block.16.layer.0.SelfAttention.k
|
| 692 |
+
encoder.block.16.layer.0.SelfAttention.v
|
| 693 |
+
encoder.block.16.layer.0.SelfAttention.o
|
| 694 |
+
encoder.block.16.layer.1.DenseReluDense.wi
|
| 695 |
+
encoder.block.16.layer.1.DenseReluDense.wo
|
| 696 |
+
encoder.block.17.layer.0.SelfAttention.q
|
| 697 |
+
encoder.block.17.layer.0.SelfAttention.k
|
| 698 |
+
encoder.block.17.layer.0.SelfAttention.v
|
| 699 |
+
encoder.block.17.layer.0.SelfAttention.o
|
| 700 |
+
encoder.block.17.layer.1.DenseReluDense.wi
|
| 701 |
+
encoder.block.17.layer.1.DenseReluDense.wo
|
| 702 |
+
encoder.block.18.layer.0.SelfAttention.q
|
| 703 |
+
encoder.block.18.layer.0.SelfAttention.k
|
| 704 |
+
encoder.block.18.layer.0.SelfAttention.v
|
| 705 |
+
encoder.block.18.layer.0.SelfAttention.o
|
| 706 |
+
encoder.block.18.layer.1.DenseReluDense.wi
|
| 707 |
+
encoder.block.18.layer.1.DenseReluDense.wo
|
| 708 |
+
encoder.block.19.layer.0.SelfAttention.q
|
| 709 |
+
encoder.block.19.layer.0.SelfAttention.k
|
| 710 |
+
encoder.block.19.layer.0.SelfAttention.v
|
| 711 |
+
encoder.block.19.layer.0.SelfAttention.o
|
| 712 |
+
encoder.block.19.layer.1.DenseReluDense.wi
|
| 713 |
+
encoder.block.19.layer.1.DenseReluDense.wo
|
| 714 |
+
encoder.block.20.layer.0.SelfAttention.q
|
| 715 |
+
encoder.block.20.layer.0.SelfAttention.k
|
| 716 |
+
encoder.block.20.layer.0.SelfAttention.v
|
| 717 |
+
encoder.block.20.layer.0.SelfAttention.o
|
| 718 |
+
encoder.block.20.layer.1.DenseReluDense.wi
|
| 719 |
+
encoder.block.20.layer.1.DenseReluDense.wo
|
| 720 |
+
encoder.block.21.layer.0.SelfAttention.q
|
| 721 |
+
encoder.block.21.layer.0.SelfAttention.k
|
| 722 |
+
encoder.block.21.layer.0.SelfAttention.v
|
| 723 |
+
encoder.block.21.layer.0.SelfAttention.o
|
| 724 |
+
encoder.block.21.layer.1.DenseReluDense.wi
|
| 725 |
+
encoder.block.21.layer.1.DenseReluDense.wo
|
| 726 |
+
encoder.block.22.layer.0.SelfAttention.q
|
| 727 |
+
encoder.block.22.layer.0.SelfAttention.k
|
| 728 |
+
encoder.block.22.layer.0.SelfAttention.v
|
| 729 |
+
encoder.block.22.layer.0.SelfAttention.o
|
| 730 |
+
encoder.block.22.layer.1.DenseReluDense.wi
|
| 731 |
+
encoder.block.22.layer.1.DenseReluDense.wo
|
| 732 |
+
encoder.block.23.layer.0.SelfAttention.q
|
| 733 |
+
encoder.block.23.layer.0.SelfAttention.k
|
| 734 |
+
encoder.block.23.layer.0.SelfAttention.v
|
| 735 |
+
encoder.block.23.layer.0.SelfAttention.o
|
| 736 |
+
encoder.block.23.layer.1.DenseReluDense.wi
|
| 737 |
+
encoder.block.23.layer.1.DenseReluDense.wo
|
| 738 |
+
Done.
|
workspace/flan-ts-base.txt
ADDED
|
@@ -0,0 +1,437 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CUDA extension not installed.
|
| 2 |
+
Downloading (��)lve/main/config.json: 100%|��| 1.40k/1.40k [00:00<00:00, 3.76MB/s]
|
| 3 |
+
Downloading pytorch_model.bin: 100%|������������������| 990M/990M [00:11<00:00, 89.8MB/s]
|
| 4 |
+
Some weights of the model checkpoint at google/flan-t5-base were not used when initializing T5EncoderModel: ['decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.embed_tokens.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_1.weight', 'decoder.final_layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'lm_head.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight']
|
| 5 |
+
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
| 6 |
+
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
| 7 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 8 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 9 |
+
Downloading (��)okenizer_config.json: 100%|��| 2.54k/2.54k [00:00<00:00, 9.34MB/s]
|
| 10 |
+
Downloading spiece.model: 100%|����������������������������| 792k/792k [00:00<00:00, 26.2MB/s]
|
| 11 |
+
Downloading (��)cial_tokens_map.json: 100%|��| 2.20k/2.20k [00:00<00:00, 8.32MB/s]
|
| 12 |
+
Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
|
| 13 |
+
Starting ...
|
| 14 |
+
Ready.
|
| 15 |
+
0 layer.0.SelfAttention.q
|
| 16 |
+
Quantizing ...
|
| 17 |
+
time 0.52
|
| 18 |
+
error 146.0775604248047
|
| 19 |
+
0 layer.0.SelfAttention.k
|
| 20 |
+
Quantizing ...
|
| 21 |
+
time 0.26
|
| 22 |
+
error 10098.515625
|
| 23 |
+
0 layer.0.SelfAttention.v
|
| 24 |
+
Quantizing ...
|
| 25 |
+
time 0.26
|
| 26 |
+
error 2831.77734375
|
| 27 |
+
0 layer.0.SelfAttention.o
|
| 28 |
+
Quantizing ...
|
| 29 |
+
time 0.28
|
| 30 |
+
error 169348.390625
|
| 31 |
+
0 layer.1.DenseReluDense.wi_0
|
| 32 |
+
Quantizing ...
|
| 33 |
+
time 0.27
|
| 34 |
+
error 13075.279296875
|
| 35 |
+
0 layer.1.DenseReluDense.wi_1
|
| 36 |
+
Quantizing ...
|
| 37 |
+
time 0.27
|
| 38 |
+
error 13343.080078125
|
| 39 |
+
0 layer.1.DenseReluDense.wo
|
| 40 |
+
Quantizing ...
|
| 41 |
+
time 0.71
|
| 42 |
+
error 223388.6875
|
| 43 |
+
1 layer.0.SelfAttention.q
|
| 44 |
+
Quantizing ...
|
| 45 |
+
time 0.35
|
| 46 |
+
error 152.99575805664062
|
| 47 |
+
1 layer.0.SelfAttention.k
|
| 48 |
+
Quantizing ...
|
| 49 |
+
time 0.26
|
| 50 |
+
error 9350.123046875
|
| 51 |
+
1 layer.0.SelfAttention.v
|
| 52 |
+
Quantizing ...
|
| 53 |
+
time 0.26
|
| 54 |
+
error 2726.8740234375
|
| 55 |
+
1 layer.0.SelfAttention.o
|
| 56 |
+
Quantizing ...
|
| 57 |
+
time 0.26
|
| 58 |
+
error 46640.0390625
|
| 59 |
+
1 layer.1.DenseReluDense.wi_0
|
| 60 |
+
Quantizing ...
|
| 61 |
+
time 0.26
|
| 62 |
+
error 14291.783203125
|
| 63 |
+
1 layer.1.DenseReluDense.wi_1
|
| 64 |
+
Quantizing ...
|
| 65 |
+
time 0.26
|
| 66 |
+
error 15036.92578125
|
| 67 |
+
1 layer.1.DenseReluDense.wo
|
| 68 |
+
Quantizing ...
|
| 69 |
+
time 0.69
|
| 70 |
+
error 69473232.0
|
| 71 |
+
2 layer.0.SelfAttention.q
|
| 72 |
+
Quantizing ...
|
| 73 |
+
time 0.35
|
| 74 |
+
error 150.92425537109375
|
| 75 |
+
2 layer.0.SelfAttention.k
|
| 76 |
+
Quantizing ...
|
| 77 |
+
time 0.26
|
| 78 |
+
error 8416.51171875
|
| 79 |
+
2 layer.0.SelfAttention.v
|
| 80 |
+
Quantizing ...
|
| 81 |
+
time 0.26
|
| 82 |
+
error 4465.57470703125
|
| 83 |
+
2 layer.0.SelfAttention.o
|
| 84 |
+
Quantizing ...
|
| 85 |
+
time 0.26
|
| 86 |
+
error 21976.58984375
|
| 87 |
+
2 layer.1.DenseReluDense.wi_0
|
| 88 |
+
Quantizing ...
|
| 89 |
+
time 0.26
|
| 90 |
+
error 10551.6982421875
|
| 91 |
+
2 layer.1.DenseReluDense.wi_1
|
| 92 |
+
Quantizing ...
|
| 93 |
+
time 0.26
|
| 94 |
+
error 27373.30078125
|
| 95 |
+
2 layer.1.DenseReluDense.wo
|
| 96 |
+
Quantizing ...
|
| 97 |
+
time 0.69
|
| 98 |
+
error 2248954.5
|
| 99 |
+
3 layer.0.SelfAttention.q
|
| 100 |
+
Quantizing ...
|
| 101 |
+
time 0.35
|
| 102 |
+
error 112.25468444824219
|
| 103 |
+
3 layer.0.SelfAttention.k
|
| 104 |
+
Quantizing ...
|
| 105 |
+
time 0.26
|
| 106 |
+
error 6374.623046875
|
| 107 |
+
3 layer.0.SelfAttention.v
|
| 108 |
+
Quantizing ...
|
| 109 |
+
time 0.28
|
| 110 |
+
error 6320.84765625
|
| 111 |
+
3 layer.0.SelfAttention.o
|
| 112 |
+
Quantizing ...
|
| 113 |
+
time 0.26
|
| 114 |
+
error 39145.75
|
| 115 |
+
3 layer.1.DenseReluDense.wi_0
|
| 116 |
+
Quantizing ...
|
| 117 |
+
time 0.27
|
| 118 |
+
error 8442.021484375
|
| 119 |
+
3 layer.1.DenseReluDense.wi_1
|
| 120 |
+
Quantizing ...
|
| 121 |
+
time 0.28
|
| 122 |
+
error 16249.869140625
|
| 123 |
+
3 layer.1.DenseReluDense.wo
|
| 124 |
+
Quantizing ...
|
| 125 |
+
time 0.69
|
| 126 |
+
error 413714.4375
|
| 127 |
+
4 layer.0.SelfAttention.q
|
| 128 |
+
Quantizing ...
|
| 129 |
+
time 0.37
|
| 130 |
+
error 106.27033996582031
|
| 131 |
+
4 layer.0.SelfAttention.k
|
| 132 |
+
Quantizing ...
|
| 133 |
+
time 0.26
|
| 134 |
+
error 5648.173828125
|
| 135 |
+
4 layer.0.SelfAttention.v
|
| 136 |
+
Quantizing ...
|
| 137 |
+
time 0.27
|
| 138 |
+
error 8460.7470703125
|
| 139 |
+
4 layer.0.SelfAttention.o
|
| 140 |
+
Quantizing ...
|
| 141 |
+
time 0.28
|
| 142 |
+
error 31233.669921875
|
| 143 |
+
4 layer.1.DenseReluDense.wi_0
|
| 144 |
+
Quantizing ...
|
| 145 |
+
time 0.26
|
| 146 |
+
error 7027.775390625
|
| 147 |
+
4 layer.1.DenseReluDense.wi_1
|
| 148 |
+
Quantizing ...
|
| 149 |
+
time 0.26
|
| 150 |
+
error 17504.884765625
|
| 151 |
+
4 layer.1.DenseReluDense.wo
|
| 152 |
+
Quantizing ...
|
| 153 |
+
time 0.71
|
| 154 |
+
error 615970.0
|
| 155 |
+
5 layer.0.SelfAttention.q
|
| 156 |
+
Quantizing ...
|
| 157 |
+
time 0.35
|
| 158 |
+
error 82.1971206665039
|
| 159 |
+
5 layer.0.SelfAttention.k
|
| 160 |
+
Quantizing ...
|
| 161 |
+
time 0.26
|
| 162 |
+
error 5029.64208984375
|
| 163 |
+
5 layer.0.SelfAttention.v
|
| 164 |
+
Quantizing ...
|
| 165 |
+
time 0.26
|
| 166 |
+
error 9543.3857421875
|
| 167 |
+
5 layer.0.SelfAttention.o
|
| 168 |
+
Quantizing ...
|
| 169 |
+
time 0.26
|
| 170 |
+
error 165621.421875
|
| 171 |
+
5 layer.1.DenseReluDense.wi_0
|
| 172 |
+
Quantizing ...
|
| 173 |
+
time 0.26
|
| 174 |
+
error 5663.072265625
|
| 175 |
+
5 layer.1.DenseReluDense.wi_1
|
| 176 |
+
Quantizing ...
|
| 177 |
+
time 0.26
|
| 178 |
+
error 19491.24609375
|
| 179 |
+
5 layer.1.DenseReluDense.wo
|
| 180 |
+
Quantizing ...
|
| 181 |
+
time 0.69
|
| 182 |
+
error 758556.875
|
| 183 |
+
6 layer.0.SelfAttention.q
|
| 184 |
+
Quantizing ...
|
| 185 |
+
time 0.35
|
| 186 |
+
error 69.80455780029297
|
| 187 |
+
6 layer.0.SelfAttention.k
|
| 188 |
+
Quantizing ...
|
| 189 |
+
time 0.26
|
| 190 |
+
error 4538.78271484375
|
| 191 |
+
6 layer.0.SelfAttention.v
|
| 192 |
+
Quantizing ...
|
| 193 |
+
time 0.26
|
| 194 |
+
error 12373.19921875
|
| 195 |
+
6 layer.0.SelfAttention.o
|
| 196 |
+
Quantizing ...
|
| 197 |
+
time 0.26
|
| 198 |
+
error 206647.40625
|
| 199 |
+
6 layer.1.DenseReluDense.wi_0
|
| 200 |
+
Quantizing ...
|
| 201 |
+
time 0.26
|
| 202 |
+
error 5316.70654296875
|
| 203 |
+
6 layer.1.DenseReluDense.wi_1
|
| 204 |
+
Quantizing ...
|
| 205 |
+
time 0.26
|
| 206 |
+
error 22911.0390625
|
| 207 |
+
6 layer.1.DenseReluDense.wo
|
| 208 |
+
Quantizing ...
|
| 209 |
+
time 0.69
|
| 210 |
+
error 874569.5
|
| 211 |
+
7 layer.0.SelfAttention.q
|
| 212 |
+
Quantizing ...
|
| 213 |
+
time 0.35
|
| 214 |
+
error 61.30769348144531
|
| 215 |
+
7 layer.0.SelfAttention.k
|
| 216 |
+
Quantizing ...
|
| 217 |
+
time 0.26
|
| 218 |
+
error 3534.55078125
|
| 219 |
+
7 layer.0.SelfAttention.v
|
| 220 |
+
Quantizing ...
|
| 221 |
+
time 0.26
|
| 222 |
+
error 14965.638671875
|
| 223 |
+
7 layer.0.SelfAttention.o
|
| 224 |
+
Quantizing ...
|
| 225 |
+
time 0.26
|
| 226 |
+
error 120621.015625
|
| 227 |
+
7 layer.1.DenseReluDense.wi_0
|
| 228 |
+
Quantizing ...
|
| 229 |
+
time 0.26
|
| 230 |
+
error 4825.25634765625
|
| 231 |
+
7 layer.1.DenseReluDense.wi_1
|
| 232 |
+
Quantizing ...
|
| 233 |
+
time 0.26
|
| 234 |
+
error 23851.55078125
|
| 235 |
+
7 layer.1.DenseReluDense.wo
|
| 236 |
+
Quantizing ...
|
| 237 |
+
time 0.71
|
| 238 |
+
error 1010260.9375
|
| 239 |
+
8 layer.0.SelfAttention.q
|
| 240 |
+
Quantizing ...
|
| 241 |
+
time 0.36
|
| 242 |
+
error 67.33954620361328
|
| 243 |
+
8 layer.0.SelfAttention.k
|
| 244 |
+
Quantizing ...
|
| 245 |
+
time 0.27
|
| 246 |
+
error 3172.860595703125
|
| 247 |
+
8 layer.0.SelfAttention.v
|
| 248 |
+
Quantizing ...
|
| 249 |
+
time 0.27
|
| 250 |
+
error 22393.306640625
|
| 251 |
+
8 layer.0.SelfAttention.o
|
| 252 |
+
Quantizing ...
|
| 253 |
+
time 0.26
|
| 254 |
+
error 295393.03125
|
| 255 |
+
8 layer.1.DenseReluDense.wi_0
|
| 256 |
+
Quantizing ...
|
| 257 |
+
time 0.26
|
| 258 |
+
error 4726.32470703125
|
| 259 |
+
8 layer.1.DenseReluDense.wi_1
|
| 260 |
+
Quantizing ...
|
| 261 |
+
time 0.27
|
| 262 |
+
error 32944.5
|
| 263 |
+
8 layer.1.DenseReluDense.wo
|
| 264 |
+
Quantizing ...
|
| 265 |
+
time 0.72
|
| 266 |
+
error 120079864.0
|
| 267 |
+
9 layer.0.SelfAttention.q
|
| 268 |
+
Quantizing ...
|
| 269 |
+
time 0.39
|
| 270 |
+
error 64.98255920410156
|
| 271 |
+
9 layer.0.SelfAttention.k
|
| 272 |
+
Quantizing ...
|
| 273 |
+
time 0.26
|
| 274 |
+
error 3637.16455078125
|
| 275 |
+
9 layer.0.SelfAttention.v
|
| 276 |
+
Quantizing ...
|
| 277 |
+
time 0.26
|
| 278 |
+
error 25351.625
|
| 279 |
+
9 layer.0.SelfAttention.o
|
| 280 |
+
Quantizing ...
|
| 281 |
+
time 0.26
|
| 282 |
+
error 810347.9375
|
| 283 |
+
9 layer.1.DenseReluDense.wi_0
|
| 284 |
+
Quantizing ...
|
| 285 |
+
time 0.26
|
| 286 |
+
error 4957.8681640625
|
| 287 |
+
9 layer.1.DenseReluDense.wi_1
|
| 288 |
+
Quantizing ...
|
| 289 |
+
time 0.26
|
| 290 |
+
error 38014.75390625
|
| 291 |
+
9 layer.1.DenseReluDense.wo
|
| 292 |
+
Quantizing ...
|
| 293 |
+
time 0.70
|
| 294 |
+
error 2600309.75
|
| 295 |
+
10 layer.0.SelfAttention.q
|
| 296 |
+
Quantizing ...
|
| 297 |
+
time 0.35
|
| 298 |
+
error 48.993934631347656
|
| 299 |
+
10 layer.0.SelfAttention.k
|
| 300 |
+
Quantizing ...
|
| 301 |
+
time 0.26
|
| 302 |
+
error 2914.375
|
| 303 |
+
10 layer.0.SelfAttention.v
|
| 304 |
+
Quantizing ...
|
| 305 |
+
time 0.26
|
| 306 |
+
error 26259.44140625
|
| 307 |
+
10 layer.0.SelfAttention.o
|
| 308 |
+
Quantizing ...
|
| 309 |
+
time 0.26
|
| 310 |
+
error 1072011.75
|
| 311 |
+
10 layer.1.DenseReluDense.wi_0
|
| 312 |
+
Quantizing ...
|
| 313 |
+
time 0.26
|
| 314 |
+
error 4582.3369140625
|
| 315 |
+
10 layer.1.DenseReluDense.wi_1
|
| 316 |
+
Quantizing ...
|
| 317 |
+
time 0.26
|
| 318 |
+
error 42805.3125
|
| 319 |
+
10 layer.1.DenseReluDense.wo
|
| 320 |
+
Quantizing ...
|
| 321 |
+
time 0.69
|
| 322 |
+
error 14100471.0
|
| 323 |
+
11 layer.0.SelfAttention.q
|
| 324 |
+
Quantizing ...
|
| 325 |
+
time 0.35
|
| 326 |
+
error 56.52388000488281
|
| 327 |
+
11 layer.0.SelfAttention.k
|
| 328 |
+
Quantizing ...
|
| 329 |
+
time 0.26
|
| 330 |
+
error 2580.15380859375
|
| 331 |
+
11 layer.0.SelfAttention.v
|
| 332 |
+
Quantizing ...
|
| 333 |
+
time 0.26
|
| 334 |
+
error 32459.890625
|
| 335 |
+
11 layer.0.SelfAttention.o
|
| 336 |
+
Quantizing ...
|
| 337 |
+
time 0.26
|
| 338 |
+
error 1562133.25
|
| 339 |
+
11 layer.1.DenseReluDense.wi_0
|
| 340 |
+
Quantizing ...
|
| 341 |
+
time 0.26
|
| 342 |
+
error 5719.791015625
|
| 343 |
+
11 layer.1.DenseReluDense.wi_1
|
| 344 |
+
Quantizing ...
|
| 345 |
+
time 0.26
|
| 346 |
+
error 70109.4296875
|
| 347 |
+
11 layer.1.DenseReluDense.wo
|
| 348 |
+
Quantizing ...
|
| 349 |
+
time 0.69
|
| 350 |
+
error 325791776.0
|
| 351 |
+
49.135838985443115
|
| 352 |
+
Packing ...
|
| 353 |
+
encoder.block.0.layer.0.SelfAttention.q
|
| 354 |
+
encoder.block.0.layer.0.SelfAttention.k
|
| 355 |
+
encoder.block.0.layer.0.SelfAttention.v
|
| 356 |
+
encoder.block.0.layer.0.SelfAttention.o
|
| 357 |
+
encoder.block.0.layer.1.DenseReluDense.wi_0
|
| 358 |
+
encoder.block.0.layer.1.DenseReluDense.wi_1
|
| 359 |
+
encoder.block.0.layer.1.DenseReluDense.wo
|
| 360 |
+
encoder.block.1.layer.0.SelfAttention.q
|
| 361 |
+
encoder.block.1.layer.0.SelfAttention.k
|
| 362 |
+
encoder.block.1.layer.0.SelfAttention.v
|
| 363 |
+
encoder.block.1.layer.0.SelfAttention.o
|
| 364 |
+
encoder.block.1.layer.1.DenseReluDense.wi_0
|
| 365 |
+
encoder.block.1.layer.1.DenseReluDense.wi_1
|
| 366 |
+
encoder.block.1.layer.1.DenseReluDense.wo
|
| 367 |
+
encoder.block.2.layer.0.SelfAttention.q
|
| 368 |
+
encoder.block.2.layer.0.SelfAttention.k
|
| 369 |
+
encoder.block.2.layer.0.SelfAttention.v
|
| 370 |
+
encoder.block.2.layer.0.SelfAttention.o
|
| 371 |
+
encoder.block.2.layer.1.DenseReluDense.wi_0
|
| 372 |
+
encoder.block.2.layer.1.DenseReluDense.wi_1
|
| 373 |
+
encoder.block.2.layer.1.DenseReluDense.wo
|
| 374 |
+
encoder.block.3.layer.0.SelfAttention.q
|
| 375 |
+
encoder.block.3.layer.0.SelfAttention.k
|
| 376 |
+
encoder.block.3.layer.0.SelfAttention.v
|
| 377 |
+
encoder.block.3.layer.0.SelfAttention.o
|
| 378 |
+
encoder.block.3.layer.1.DenseReluDense.wi_0
|
| 379 |
+
encoder.block.3.layer.1.DenseReluDense.wi_1
|
| 380 |
+
encoder.block.3.layer.1.DenseReluDense.wo
|
| 381 |
+
encoder.block.4.layer.0.SelfAttention.q
|
| 382 |
+
encoder.block.4.layer.0.SelfAttention.k
|
| 383 |
+
encoder.block.4.layer.0.SelfAttention.v
|
| 384 |
+
encoder.block.4.layer.0.SelfAttention.o
|
| 385 |
+
encoder.block.4.layer.1.DenseReluDense.wi_0
|
| 386 |
+
encoder.block.4.layer.1.DenseReluDense.wi_1
|
| 387 |
+
encoder.block.4.layer.1.DenseReluDense.wo
|
| 388 |
+
encoder.block.5.layer.0.SelfAttention.q
|
| 389 |
+
encoder.block.5.layer.0.SelfAttention.k
|
| 390 |
+
encoder.block.5.layer.0.SelfAttention.v
|
| 391 |
+
encoder.block.5.layer.0.SelfAttention.o
|
| 392 |
+
encoder.block.5.layer.1.DenseReluDense.wi_0
|
| 393 |
+
encoder.block.5.layer.1.DenseReluDense.wi_1
|
| 394 |
+
encoder.block.5.layer.1.DenseReluDense.wo
|
| 395 |
+
encoder.block.6.layer.0.SelfAttention.q
|
| 396 |
+
encoder.block.6.layer.0.SelfAttention.k
|
| 397 |
+
encoder.block.6.layer.0.SelfAttention.v
|
| 398 |
+
encoder.block.6.layer.0.SelfAttention.o
|
| 399 |
+
encoder.block.6.layer.1.DenseReluDense.wi_0
|
| 400 |
+
encoder.block.6.layer.1.DenseReluDense.wi_1
|
| 401 |
+
encoder.block.6.layer.1.DenseReluDense.wo
|
| 402 |
+
encoder.block.7.layer.0.SelfAttention.q
|
| 403 |
+
encoder.block.7.layer.0.SelfAttention.k
|
| 404 |
+
encoder.block.7.layer.0.SelfAttention.v
|
| 405 |
+
encoder.block.7.layer.0.SelfAttention.o
|
| 406 |
+
encoder.block.7.layer.1.DenseReluDense.wi_0
|
| 407 |
+
encoder.block.7.layer.1.DenseReluDense.wi_1
|
| 408 |
+
encoder.block.7.layer.1.DenseReluDense.wo
|
| 409 |
+
encoder.block.8.layer.0.SelfAttention.q
|
| 410 |
+
encoder.block.8.layer.0.SelfAttention.k
|
| 411 |
+
encoder.block.8.layer.0.SelfAttention.v
|
| 412 |
+
encoder.block.8.layer.0.SelfAttention.o
|
| 413 |
+
encoder.block.8.layer.1.DenseReluDense.wi_0
|
| 414 |
+
encoder.block.8.layer.1.DenseReluDense.wi_1
|
| 415 |
+
encoder.block.8.layer.1.DenseReluDense.wo
|
| 416 |
+
encoder.block.9.layer.0.SelfAttention.q
|
| 417 |
+
encoder.block.9.layer.0.SelfAttention.k
|
| 418 |
+
encoder.block.9.layer.0.SelfAttention.v
|
| 419 |
+
encoder.block.9.layer.0.SelfAttention.o
|
| 420 |
+
encoder.block.9.layer.1.DenseReluDense.wi_0
|
| 421 |
+
encoder.block.9.layer.1.DenseReluDense.wi_1
|
| 422 |
+
encoder.block.9.layer.1.DenseReluDense.wo
|
| 423 |
+
encoder.block.10.layer.0.SelfAttention.q
|
| 424 |
+
encoder.block.10.layer.0.SelfAttention.k
|
| 425 |
+
encoder.block.10.layer.0.SelfAttention.v
|
| 426 |
+
encoder.block.10.layer.0.SelfAttention.o
|
| 427 |
+
encoder.block.10.layer.1.DenseReluDense.wi_0
|
| 428 |
+
encoder.block.10.layer.1.DenseReluDense.wi_1
|
| 429 |
+
encoder.block.10.layer.1.DenseReluDense.wo
|
| 430 |
+
encoder.block.11.layer.0.SelfAttention.q
|
| 431 |
+
encoder.block.11.layer.0.SelfAttention.k
|
| 432 |
+
encoder.block.11.layer.0.SelfAttention.v
|
| 433 |
+
encoder.block.11.layer.0.SelfAttention.o
|
| 434 |
+
encoder.block.11.layer.1.DenseReluDense.wi_0
|
| 435 |
+
encoder.block.11.layer.1.DenseReluDense.wi_1
|
| 436 |
+
encoder.block.11.layer.1.DenseReluDense.wo
|
| 437 |
+
Done.
|
workspace/flan-ts-large.txt
ADDED
|
@@ -0,0 +1,857 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CUDA extension not installed.
|
| 2 |
+
Downloading (��)lve/main/config.json: 100%|����������| 662/662 [00:00<00:00, 1.65MB/s]
|
| 3 |
+
Downloading pytorch_model.bin: 100%|��������������| 3.13G/3.13G [00:36<00:00, 86.9MB/s]
|
| 4 |
+
Some weights of the model checkpoint at google/flan-t5-large were not used when initializing T5EncoderModel: ['decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.embed_tokens.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'lm_head.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.final_layer_norm.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight']
|
| 5 |
+
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
| 6 |
+
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
| 7 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 8 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 9 |
+
Downloading (��)okenizer_config.json: 100%|��| 2.54k/2.54k [00:00<00:00, 9.09MB/s]
|
| 10 |
+
Downloading spiece.model: 100%|����������������������������| 792k/792k [00:00<00:00, 28.7MB/s]
|
| 11 |
+
Downloading (��)cial_tokens_map.json: 100%|��| 2.20k/2.20k [00:00<00:00, 7.83MB/s]
|
| 12 |
+
Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
|
| 13 |
+
Starting ...
|
| 14 |
+
Ready.
|
| 15 |
+
0 layer.0.SelfAttention.q
|
| 16 |
+
Quantizing ...
|
| 17 |
+
time 0.55
|
| 18 |
+
error 142.37025451660156
|
| 19 |
+
0 layer.0.SelfAttention.k
|
| 20 |
+
Quantizing ...
|
| 21 |
+
time 0.25
|
| 22 |
+
error 9521.5029296875
|
| 23 |
+
0 layer.0.SelfAttention.v
|
| 24 |
+
Quantizing ...
|
| 25 |
+
time 0.26
|
| 26 |
+
error 2544.900390625
|
| 27 |
+
0 layer.0.SelfAttention.o
|
| 28 |
+
Quantizing ...
|
| 29 |
+
time 0.28
|
| 30 |
+
error 123186.2578125
|
| 31 |
+
0 layer.1.DenseReluDense.wi_0
|
| 32 |
+
Quantizing ...
|
| 33 |
+
time 0.25
|
| 34 |
+
error 11158.978515625
|
| 35 |
+
0 layer.1.DenseReluDense.wi_1
|
| 36 |
+
Quantizing ...
|
| 37 |
+
time 0.25
|
| 38 |
+
error 9518.11328125
|
| 39 |
+
0 layer.1.DenseReluDense.wo
|
| 40 |
+
Quantizing ...
|
| 41 |
+
time 0.72
|
| 42 |
+
error 3637286.0
|
| 43 |
+
1 layer.0.SelfAttention.q
|
| 44 |
+
Quantizing ...
|
| 45 |
+
time 0.39
|
| 46 |
+
error 536.7674560546875
|
| 47 |
+
1 layer.0.SelfAttention.k
|
| 48 |
+
Quantizing ...
|
| 49 |
+
time 0.25
|
| 50 |
+
error 25588.546875
|
| 51 |
+
1 layer.0.SelfAttention.v
|
| 52 |
+
Quantizing ...
|
| 53 |
+
time 0.25
|
| 54 |
+
error 1919.272216796875
|
| 55 |
+
1 layer.0.SelfAttention.o
|
| 56 |
+
Quantizing ...
|
| 57 |
+
time 0.25
|
| 58 |
+
error 47080.5625
|
| 59 |
+
1 layer.1.DenseReluDense.wi_0
|
| 60 |
+
Quantizing ...
|
| 61 |
+
time 0.25
|
| 62 |
+
error 9808.359375
|
| 63 |
+
1 layer.1.DenseReluDense.wi_1
|
| 64 |
+
Quantizing ...
|
| 65 |
+
time 0.25
|
| 66 |
+
error 6298.18896484375
|
| 67 |
+
1 layer.1.DenseReluDense.wo
|
| 68 |
+
Quantizing ...
|
| 69 |
+
time 0.71
|
| 70 |
+
error 137391.875
|
| 71 |
+
2 layer.0.SelfAttention.q
|
| 72 |
+
Quantizing ...
|
| 73 |
+
time 0.41
|
| 74 |
+
error 125.06156921386719
|
| 75 |
+
2 layer.0.SelfAttention.k
|
| 76 |
+
Quantizing ...
|
| 77 |
+
time 0.25
|
| 78 |
+
error 6493.82568359375
|
| 79 |
+
2 layer.0.SelfAttention.v
|
| 80 |
+
Quantizing ...
|
| 81 |
+
time 0.25
|
| 82 |
+
error 1306.6259765625
|
| 83 |
+
2 layer.0.SelfAttention.o
|
| 84 |
+
Quantizing ...
|
| 85 |
+
time 0.25
|
| 86 |
+
error 3543.05029296875
|
| 87 |
+
2 layer.1.DenseReluDense.wi_0
|
| 88 |
+
Quantizing ...
|
| 89 |
+
time 0.25
|
| 90 |
+
error 10326.599609375
|
| 91 |
+
2 layer.1.DenseReluDense.wi_1
|
| 92 |
+
Quantizing ...
|
| 93 |
+
time 0.25
|
| 94 |
+
error 8165.3193359375
|
| 95 |
+
2 layer.1.DenseReluDense.wo
|
| 96 |
+
Quantizing ...
|
| 97 |
+
time 0.69
|
| 98 |
+
error 105276.7265625
|
| 99 |
+
3 layer.0.SelfAttention.q
|
| 100 |
+
Quantizing ...
|
| 101 |
+
time 0.41
|
| 102 |
+
error 137.07083129882812
|
| 103 |
+
3 layer.0.SelfAttention.k
|
| 104 |
+
Quantizing ...
|
| 105 |
+
time 0.25
|
| 106 |
+
error 7485.19384765625
|
| 107 |
+
3 layer.0.SelfAttention.v
|
| 108 |
+
Quantizing ...
|
| 109 |
+
time 0.25
|
| 110 |
+
error 1563.48095703125
|
| 111 |
+
3 layer.0.SelfAttention.o
|
| 112 |
+
Quantizing ...
|
| 113 |
+
time 0.27
|
| 114 |
+
error 3057.40673828125
|
| 115 |
+
3 layer.1.DenseReluDense.wi_0
|
| 116 |
+
Quantizing ...
|
| 117 |
+
time 0.25
|
| 118 |
+
error 10634.482421875
|
| 119 |
+
3 layer.1.DenseReluDense.wi_1
|
| 120 |
+
Quantizing ...
|
| 121 |
+
time 0.27
|
| 122 |
+
error 9444.2841796875
|
| 123 |
+
3 layer.1.DenseReluDense.wo
|
| 124 |
+
Quantizing ...
|
| 125 |
+
time 0.73
|
| 126 |
+
error 105683.125
|
| 127 |
+
4 layer.0.SelfAttention.q
|
| 128 |
+
Quantizing ...
|
| 129 |
+
time 0.41
|
| 130 |
+
error 133.7151336669922
|
| 131 |
+
4 layer.0.SelfAttention.k
|
| 132 |
+
Quantizing ...
|
| 133 |
+
time 0.27
|
| 134 |
+
error 7297.93896484375
|
| 135 |
+
4 layer.0.SelfAttention.v
|
| 136 |
+
Quantizing ...
|
| 137 |
+
time 0.25
|
| 138 |
+
error 1610.62939453125
|
| 139 |
+
4 layer.0.SelfAttention.o
|
| 140 |
+
Quantizing ...
|
| 141 |
+
time 0.25
|
| 142 |
+
error 7214.41796875
|
| 143 |
+
4 layer.1.DenseReluDense.wi_0
|
| 144 |
+
Quantizing ...
|
| 145 |
+
time 0.25
|
| 146 |
+
error 14451.642578125
|
| 147 |
+
4 layer.1.DenseReluDense.wi_1
|
| 148 |
+
Quantizing ...
|
| 149 |
+
time 0.25
|
| 150 |
+
error 15960.328125
|
| 151 |
+
4 layer.1.DenseReluDense.wo
|
| 152 |
+
Quantizing ...
|
| 153 |
+
time 0.69
|
| 154 |
+
error 4980679168.0
|
| 155 |
+
5 layer.0.SelfAttention.q
|
| 156 |
+
Quantizing ...
|
| 157 |
+
time 0.39
|
| 158 |
+
error 140.4214324951172
|
| 159 |
+
5 layer.0.SelfAttention.k
|
| 160 |
+
Quantizing ...
|
| 161 |
+
time 0.25
|
| 162 |
+
error 7479.8193359375
|
| 163 |
+
5 layer.0.SelfAttention.v
|
| 164 |
+
Quantizing ...
|
| 165 |
+
time 0.25
|
| 166 |
+
error 2484.518310546875
|
| 167 |
+
5 layer.0.SelfAttention.o
|
| 168 |
+
Quantizing ...
|
| 169 |
+
time 0.25
|
| 170 |
+
error 8618.46484375
|
| 171 |
+
5 layer.1.DenseReluDense.wi_0
|
| 172 |
+
Quantizing ...
|
| 173 |
+
time 0.27
|
| 174 |
+
error 10754.0419921875
|
| 175 |
+
5 layer.1.DenseReluDense.wi_1
|
| 176 |
+
Quantizing ...
|
| 177 |
+
time 0.25
|
| 178 |
+
error 13012.9423828125
|
| 179 |
+
5 layer.1.DenseReluDense.wo
|
| 180 |
+
Quantizing ...
|
| 181 |
+
time 0.69
|
| 182 |
+
error 107111.1875
|
| 183 |
+
6 layer.0.SelfAttention.q
|
| 184 |
+
Quantizing ...
|
| 185 |
+
time 0.40
|
| 186 |
+
error 112.6629867553711
|
| 187 |
+
6 layer.0.SelfAttention.k
|
| 188 |
+
Quantizing ...
|
| 189 |
+
time 0.25
|
| 190 |
+
error 7047.806640625
|
| 191 |
+
6 layer.0.SelfAttention.v
|
| 192 |
+
Quantizing ...
|
| 193 |
+
time 0.25
|
| 194 |
+
error 2059.9892578125
|
| 195 |
+
6 layer.0.SelfAttention.o
|
| 196 |
+
Quantizing ...
|
| 197 |
+
time 0.25
|
| 198 |
+
error 5445.0029296875
|
| 199 |
+
6 layer.1.DenseReluDense.wi_0
|
| 200 |
+
Quantizing ...
|
| 201 |
+
time 0.26
|
| 202 |
+
error 11107.181640625
|
| 203 |
+
6 layer.1.DenseReluDense.wi_1
|
| 204 |
+
Quantizing ...
|
| 205 |
+
time 0.25
|
| 206 |
+
error 15983.3603515625
|
| 207 |
+
6 layer.1.DenseReluDense.wo
|
| 208 |
+
Quantizing ...
|
| 209 |
+
time 0.70
|
| 210 |
+
error 685753216.0
|
| 211 |
+
7 layer.0.SelfAttention.q
|
| 212 |
+
Quantizing ...
|
| 213 |
+
time 0.41
|
| 214 |
+
error 133.351806640625
|
| 215 |
+
7 layer.0.SelfAttention.k
|
| 216 |
+
Quantizing ...
|
| 217 |
+
time 0.25
|
| 218 |
+
error 8262.615234375
|
| 219 |
+
7 layer.0.SelfAttention.v
|
| 220 |
+
Quantizing ...
|
| 221 |
+
time 0.26
|
| 222 |
+
error 2878.16943359375
|
| 223 |
+
7 layer.0.SelfAttention.o
|
| 224 |
+
Quantizing ...
|
| 225 |
+
time 0.27
|
| 226 |
+
error 17972.373046875
|
| 227 |
+
7 layer.1.DenseReluDense.wi_0
|
| 228 |
+
Quantizing ...
|
| 229 |
+
time 0.25
|
| 230 |
+
error 11895.857421875
|
| 231 |
+
7 layer.1.DenseReluDense.wi_1
|
| 232 |
+
Quantizing ...
|
| 233 |
+
time 0.25
|
| 234 |
+
error 18337.82421875
|
| 235 |
+
7 layer.1.DenseReluDense.wo
|
| 236 |
+
Quantizing ...
|
| 237 |
+
time 0.72
|
| 238 |
+
error 25902379008.0
|
| 239 |
+
8 layer.0.SelfAttention.q
|
| 240 |
+
Quantizing ...
|
| 241 |
+
time 0.39
|
| 242 |
+
error 120.18170928955078
|
| 243 |
+
8 layer.0.SelfAttention.k
|
| 244 |
+
Quantizing ...
|
| 245 |
+
time 0.25
|
| 246 |
+
error 7699.7255859375
|
| 247 |
+
8 layer.0.SelfAttention.v
|
| 248 |
+
Quantizing ...
|
| 249 |
+
time 0.25
|
| 250 |
+
error 2972.5712890625
|
| 251 |
+
8 layer.0.SelfAttention.o
|
| 252 |
+
Quantizing ...
|
| 253 |
+
time 0.25
|
| 254 |
+
error 8750.123046875
|
| 255 |
+
8 layer.1.DenseReluDense.wi_0
|
| 256 |
+
Quantizing ...
|
| 257 |
+
time 0.25
|
| 258 |
+
error 11126.8662109375
|
| 259 |
+
8 layer.1.DenseReluDense.wi_1
|
| 260 |
+
Quantizing ...
|
| 261 |
+
time 0.25
|
| 262 |
+
error 18306.9609375
|
| 263 |
+
8 layer.1.DenseReluDense.wo
|
| 264 |
+
Quantizing ...
|
| 265 |
+
time 0.71
|
| 266 |
+
error 128990.28125
|
| 267 |
+
9 layer.0.SelfAttention.q
|
| 268 |
+
Quantizing ...
|
| 269 |
+
time 0.39
|
| 270 |
+
error 126.16083526611328
|
| 271 |
+
9 layer.0.SelfAttention.k
|
| 272 |
+
Quantizing ...
|
| 273 |
+
time 0.25
|
| 274 |
+
error 8584.9208984375
|
| 275 |
+
9 layer.0.SelfAttention.v
|
| 276 |
+
Quantizing ...
|
| 277 |
+
time 0.26
|
| 278 |
+
error 3245.54541015625
|
| 279 |
+
9 layer.0.SelfAttention.o
|
| 280 |
+
Quantizing ...
|
| 281 |
+
time 0.25
|
| 282 |
+
error 15868.41015625
|
| 283 |
+
9 layer.1.DenseReluDense.wi_0
|
| 284 |
+
Quantizing ...
|
| 285 |
+
time 0.25
|
| 286 |
+
error 9290.447265625
|
| 287 |
+
9 layer.1.DenseReluDense.wi_1
|
| 288 |
+
Quantizing ...
|
| 289 |
+
time 0.25
|
| 290 |
+
error 17894.17578125
|
| 291 |
+
9 layer.1.DenseReluDense.wo
|
| 292 |
+
Quantizing ...
|
| 293 |
+
time 0.71
|
| 294 |
+
error 149863.296875
|
| 295 |
+
10 layer.0.SelfAttention.q
|
| 296 |
+
Quantizing ...
|
| 297 |
+
time 0.39
|
| 298 |
+
error 107.48172760009766
|
| 299 |
+
10 layer.0.SelfAttention.k
|
| 300 |
+
Quantizing ...
|
| 301 |
+
time 0.27
|
| 302 |
+
error 6898.35595703125
|
| 303 |
+
10 layer.0.SelfAttention.v
|
| 304 |
+
Quantizing ...
|
| 305 |
+
time 0.26
|
| 306 |
+
error 3770.64990234375
|
| 307 |
+
10 layer.0.SelfAttention.o
|
| 308 |
+
Quantizing ...
|
| 309 |
+
time 0.25
|
| 310 |
+
error 17137.037109375
|
| 311 |
+
10 layer.1.DenseReluDense.wi_0
|
| 312 |
+
Quantizing ...
|
| 313 |
+
time 0.27
|
| 314 |
+
error 8128.5166015625
|
| 315 |
+
10 layer.1.DenseReluDense.wi_1
|
| 316 |
+
Quantizing ...
|
| 317 |
+
time 0.26
|
| 318 |
+
error 17371.587890625
|
| 319 |
+
10 layer.1.DenseReluDense.wo
|
| 320 |
+
Quantizing ...
|
| 321 |
+
time 0.73
|
| 322 |
+
error 116027.1015625
|
| 323 |
+
11 layer.0.SelfAttention.q
|
| 324 |
+
Quantizing ...
|
| 325 |
+
time 0.40
|
| 326 |
+
error 104.61625671386719
|
| 327 |
+
11 layer.0.SelfAttention.k
|
| 328 |
+
Quantizing ...
|
| 329 |
+
time 0.25
|
| 330 |
+
error 7259.4208984375
|
| 331 |
+
11 layer.0.SelfAttention.v
|
| 332 |
+
Quantizing ...
|
| 333 |
+
time 0.25
|
| 334 |
+
error 5005.52490234375
|
| 335 |
+
11 layer.0.SelfAttention.o
|
| 336 |
+
Quantizing ...
|
| 337 |
+
time 0.27
|
| 338 |
+
error 32728.1015625
|
| 339 |
+
11 layer.1.DenseReluDense.wi_0
|
| 340 |
+
Quantizing ...
|
| 341 |
+
time 0.25
|
| 342 |
+
error 8535.056640625
|
| 343 |
+
11 layer.1.DenseReluDense.wi_1
|
| 344 |
+
Quantizing ...
|
| 345 |
+
time 0.27
|
| 346 |
+
error 22538.978515625
|
| 347 |
+
11 layer.1.DenseReluDense.wo
|
| 348 |
+
Quantizing ...
|
| 349 |
+
time 0.71
|
| 350 |
+
error 170254.40625
|
| 351 |
+
12 layer.0.SelfAttention.q
|
| 352 |
+
Quantizing ...
|
| 353 |
+
time 0.39
|
| 354 |
+
error 94.82140350341797
|
| 355 |
+
12 layer.0.SelfAttention.k
|
| 356 |
+
Quantizing ...
|
| 357 |
+
time 0.25
|
| 358 |
+
error 6448.5205078125
|
| 359 |
+
12 layer.0.SelfAttention.v
|
| 360 |
+
Quantizing ...
|
| 361 |
+
time 0.25
|
| 362 |
+
error 5083.41796875
|
| 363 |
+
12 layer.0.SelfAttention.o
|
| 364 |
+
Quantizing ...
|
| 365 |
+
time 0.25
|
| 366 |
+
error 60036.953125
|
| 367 |
+
12 layer.1.DenseReluDense.wi_0
|
| 368 |
+
Quantizing ...
|
| 369 |
+
time 0.26
|
| 370 |
+
error 7829.4384765625
|
| 371 |
+
12 layer.1.DenseReluDense.wi_1
|
| 372 |
+
Quantizing ...
|
| 373 |
+
time 0.26
|
| 374 |
+
error 23411.65234375
|
| 375 |
+
12 layer.1.DenseReluDense.wo
|
| 376 |
+
Quantizing ...
|
| 377 |
+
time 0.69
|
| 378 |
+
error 231657.15625
|
| 379 |
+
13 layer.0.SelfAttention.q
|
| 380 |
+
Quantizing ...
|
| 381 |
+
time 0.39
|
| 382 |
+
error 90.77069091796875
|
| 383 |
+
13 layer.0.SelfAttention.k
|
| 384 |
+
Quantizing ...
|
| 385 |
+
time 0.25
|
| 386 |
+
error 5828.037109375
|
| 387 |
+
13 layer.0.SelfAttention.v
|
| 388 |
+
Quantizing ...
|
| 389 |
+
time 0.26
|
| 390 |
+
error 4888.35302734375
|
| 391 |
+
13 layer.0.SelfAttention.o
|
| 392 |
+
Quantizing ...
|
| 393 |
+
time 0.25
|
| 394 |
+
error 41515.46484375
|
| 395 |
+
13 layer.1.DenseReluDense.wi_0
|
| 396 |
+
Quantizing ...
|
| 397 |
+
time 0.25
|
| 398 |
+
error 7063.1728515625
|
| 399 |
+
13 layer.1.DenseReluDense.wi_1
|
| 400 |
+
Quantizing ...
|
| 401 |
+
time 0.25
|
| 402 |
+
error 23648.7421875
|
| 403 |
+
13 layer.1.DenseReluDense.wo
|
| 404 |
+
Quantizing ...
|
| 405 |
+
time 0.69
|
| 406 |
+
error 261193.75
|
| 407 |
+
14 layer.0.SelfAttention.q
|
| 408 |
+
Quantizing ...
|
| 409 |
+
time 0.39
|
| 410 |
+
error 77.24964904785156
|
| 411 |
+
14 layer.0.SelfAttention.k
|
| 412 |
+
Quantizing ...
|
| 413 |
+
time 0.27
|
| 414 |
+
error 5096.2626953125
|
| 415 |
+
14 layer.0.SelfAttention.v
|
| 416 |
+
Quantizing ...
|
| 417 |
+
time 0.26
|
| 418 |
+
error 6915.9384765625
|
| 419 |
+
14 layer.0.SelfAttention.o
|
| 420 |
+
Quantizing ...
|
| 421 |
+
time 0.26
|
| 422 |
+
error 56402.62890625
|
| 423 |
+
14 layer.1.DenseReluDense.wi_0
|
| 424 |
+
Quantizing ...
|
| 425 |
+
time 0.28
|
| 426 |
+
error 6039.11328125
|
| 427 |
+
14 layer.1.DenseReluDense.wi_1
|
| 428 |
+
Quantizing ...
|
| 429 |
+
time 0.25
|
| 430 |
+
error 24090.625
|
| 431 |
+
14 layer.1.DenseReluDense.wo
|
| 432 |
+
Quantizing ...
|
| 433 |
+
time 0.71
|
| 434 |
+
error 355204.3125
|
| 435 |
+
15 layer.0.SelfAttention.q
|
| 436 |
+
Quantizing ...
|
| 437 |
+
time 0.39
|
| 438 |
+
error 72.92942810058594
|
| 439 |
+
15 layer.0.SelfAttention.k
|
| 440 |
+
Quantizing ...
|
| 441 |
+
time 0.25
|
| 442 |
+
error 5561.1201171875
|
| 443 |
+
15 layer.0.SelfAttention.v
|
| 444 |
+
Quantizing ...
|
| 445 |
+
time 0.25
|
| 446 |
+
error 8621.376953125
|
| 447 |
+
15 layer.0.SelfAttention.o
|
| 448 |
+
Quantizing ...
|
| 449 |
+
time 0.25
|
| 450 |
+
error 146386.5625
|
| 451 |
+
15 layer.1.DenseReluDense.wi_0
|
| 452 |
+
Quantizing ...
|
| 453 |
+
time 0.25
|
| 454 |
+
error 5684.064453125
|
| 455 |
+
15 layer.1.DenseReluDense.wi_1
|
| 456 |
+
Quantizing ...
|
| 457 |
+
time 0.25
|
| 458 |
+
error 26869.12109375
|
| 459 |
+
15 layer.1.DenseReluDense.wo
|
| 460 |
+
Quantizing ...
|
| 461 |
+
time 0.70
|
| 462 |
+
error 361036.25
|
| 463 |
+
16 layer.0.SelfAttention.q
|
| 464 |
+
Quantizing ...
|
| 465 |
+
time 0.39
|
| 466 |
+
error 75.83228302001953
|
| 467 |
+
16 layer.0.SelfAttention.k
|
| 468 |
+
Quantizing ...
|
| 469 |
+
time 0.25
|
| 470 |
+
error 5176.50341796875
|
| 471 |
+
16 layer.0.SelfAttention.v
|
| 472 |
+
Quantizing ...
|
| 473 |
+
time 0.25
|
| 474 |
+
error 9754.8203125
|
| 475 |
+
16 layer.0.SelfAttention.o
|
| 476 |
+
Quantizing ...
|
| 477 |
+
time 0.25
|
| 478 |
+
error 231755.03125
|
| 479 |
+
16 layer.1.DenseReluDense.wi_0
|
| 480 |
+
Quantizing ...
|
| 481 |
+
time 0.27
|
| 482 |
+
error 5699.75390625
|
| 483 |
+
16 layer.1.DenseReluDense.wi_1
|
| 484 |
+
Quantizing ...
|
| 485 |
+
time 0.25
|
| 486 |
+
error 25039.771484375
|
| 487 |
+
16 layer.1.DenseReluDense.wo
|
| 488 |
+
Quantizing ...
|
| 489 |
+
time 0.69
|
| 490 |
+
error 651520.75
|
| 491 |
+
17 layer.0.SelfAttention.q
|
| 492 |
+
Quantizing ...
|
| 493 |
+
time 0.39
|
| 494 |
+
error 61.858299255371094
|
| 495 |
+
17 layer.0.SelfAttention.k
|
| 496 |
+
Quantizing ...
|
| 497 |
+
time 0.25
|
| 498 |
+
error 4369.08251953125
|
| 499 |
+
17 layer.0.SelfAttention.v
|
| 500 |
+
Quantizing ...
|
| 501 |
+
time 0.25
|
| 502 |
+
error 12425.16796875
|
| 503 |
+
17 layer.0.SelfAttention.o
|
| 504 |
+
Quantizing ...
|
| 505 |
+
time 0.25
|
| 506 |
+
error 408129.875
|
| 507 |
+
17 layer.1.DenseReluDense.wi_0
|
| 508 |
+
Quantizing ...
|
| 509 |
+
time 0.25
|
| 510 |
+
error 5317.8798828125
|
| 511 |
+
17 layer.1.DenseReluDense.wi_1
|
| 512 |
+
Quantizing ...
|
| 513 |
+
time 0.25
|
| 514 |
+
error 26979.31640625
|
| 515 |
+
17 layer.1.DenseReluDense.wo
|
| 516 |
+
Quantizing ...
|
| 517 |
+
time 0.73
|
| 518 |
+
error 689154.875
|
| 519 |
+
18 layer.0.SelfAttention.q
|
| 520 |
+
Quantizing ...
|
| 521 |
+
time 0.41
|
| 522 |
+
error 68.12550354003906
|
| 523 |
+
18 layer.0.SelfAttention.k
|
| 524 |
+
Quantizing ...
|
| 525 |
+
time 0.27
|
| 526 |
+
error 4010.4833984375
|
| 527 |
+
18 layer.0.SelfAttention.v
|
| 528 |
+
Quantizing ...
|
| 529 |
+
time 0.26
|
| 530 |
+
error 14657.2314453125
|
| 531 |
+
18 layer.0.SelfAttention.o
|
| 532 |
+
Quantizing ...
|
| 533 |
+
time 0.25
|
| 534 |
+
error 206627.5
|
| 535 |
+
18 layer.1.DenseReluDense.wi_0
|
| 536 |
+
Quantizing ...
|
| 537 |
+
time 0.28
|
| 538 |
+
error 6068.525390625
|
| 539 |
+
18 layer.1.DenseReluDense.wi_1
|
| 540 |
+
Quantizing ...
|
| 541 |
+
time 0.25
|
| 542 |
+
error 28093.669921875
|
| 543 |
+
18 layer.1.DenseReluDense.wo
|
| 544 |
+
Quantizing ...
|
| 545 |
+
time 0.72
|
| 546 |
+
error 1019951.8125
|
| 547 |
+
19 layer.0.SelfAttention.q
|
| 548 |
+
Quantizing ...
|
| 549 |
+
time 0.41
|
| 550 |
+
error 57.68662643432617
|
| 551 |
+
19 layer.0.SelfAttention.k
|
| 552 |
+
Quantizing ...
|
| 553 |
+
time 0.25
|
| 554 |
+
error 4086.83349609375
|
| 555 |
+
19 layer.0.SelfAttention.v
|
| 556 |
+
Quantizing ...
|
| 557 |
+
time 0.25
|
| 558 |
+
error 14453.2578125
|
| 559 |
+
19 layer.0.SelfAttention.o
|
| 560 |
+
Quantizing ...
|
| 561 |
+
time 0.25
|
| 562 |
+
error 460674.0
|
| 563 |
+
19 layer.1.DenseReluDense.wi_0
|
| 564 |
+
Quantizing ...
|
| 565 |
+
time 0.25
|
| 566 |
+
error 5235.9794921875
|
| 567 |
+
19 layer.1.DenseReluDense.wi_1
|
| 568 |
+
Quantizing ...
|
| 569 |
+
time 0.26
|
| 570 |
+
error 28788.4765625
|
| 571 |
+
19 layer.1.DenseReluDense.wo
|
| 572 |
+
Quantizing ...
|
| 573 |
+
time 0.70
|
| 574 |
+
error 1332541.0
|
| 575 |
+
20 layer.0.SelfAttention.q
|
| 576 |
+
Quantizing ...
|
| 577 |
+
time 0.39
|
| 578 |
+
error 42.9056510925293
|
| 579 |
+
20 layer.0.SelfAttention.k
|
| 580 |
+
Quantizing ...
|
| 581 |
+
time 0.25
|
| 582 |
+
error 2894.2177734375
|
| 583 |
+
20 layer.0.SelfAttention.v
|
| 584 |
+
Quantizing ...
|
| 585 |
+
time 0.25
|
| 586 |
+
error 16684.044921875
|
| 587 |
+
20 layer.0.SelfAttention.o
|
| 588 |
+
Quantizing ...
|
| 589 |
+
time 0.25
|
| 590 |
+
error 557086.6875
|
| 591 |
+
20 layer.1.DenseReluDense.wi_0
|
| 592 |
+
Quantizing ...
|
| 593 |
+
time 0.25
|
| 594 |
+
error 6791.15625
|
| 595 |
+
20 layer.1.DenseReluDense.wi_1
|
| 596 |
+
Quantizing ...
|
| 597 |
+
time 0.25
|
| 598 |
+
error 38994.37890625
|
| 599 |
+
20 layer.1.DenseReluDense.wo
|
| 600 |
+
Quantizing ...
|
| 601 |
+
time 0.69
|
| 602 |
+
error 2295082.0
|
| 603 |
+
21 layer.0.SelfAttention.q
|
| 604 |
+
Quantizing ...
|
| 605 |
+
time 0.41
|
| 606 |
+
error 58.024559020996094
|
| 607 |
+
21 layer.0.SelfAttention.k
|
| 608 |
+
Quantizing ...
|
| 609 |
+
time 0.25
|
| 610 |
+
error 3534.38427734375
|
| 611 |
+
21 layer.0.SelfAttention.v
|
| 612 |
+
Quantizing ...
|
| 613 |
+
time 0.28
|
| 614 |
+
error 23622.609375
|
| 615 |
+
21 layer.0.SelfAttention.o
|
| 616 |
+
Quantizing ...
|
| 617 |
+
time 0.26
|
| 618 |
+
error 630538.75
|
| 619 |
+
21 layer.1.DenseReluDense.wi_0
|
| 620 |
+
Quantizing ...
|
| 621 |
+
time 0.27
|
| 622 |
+
error 6944.4306640625
|
| 623 |
+
21 layer.1.DenseReluDense.wi_1
|
| 624 |
+
Quantizing ...
|
| 625 |
+
time 0.25
|
| 626 |
+
error 41437.5546875
|
| 627 |
+
21 layer.1.DenseReluDense.wo
|
| 628 |
+
Quantizing ...
|
| 629 |
+
time 0.72
|
| 630 |
+
error 2805766.25
|
| 631 |
+
22 layer.0.SelfAttention.q
|
| 632 |
+
Quantizing ...
|
| 633 |
+
time 0.39
|
| 634 |
+
error 56.98418426513672
|
| 635 |
+
22 layer.0.SelfAttention.k
|
| 636 |
+
Quantizing ...
|
| 637 |
+
time 0.27
|
| 638 |
+
error 2588.40576171875
|
| 639 |
+
22 layer.0.SelfAttention.v
|
| 640 |
+
Quantizing ...
|
| 641 |
+
time 0.26
|
| 642 |
+
error 33727.3125
|
| 643 |
+
22 layer.0.SelfAttention.o
|
| 644 |
+
Quantizing ...
|
| 645 |
+
time 0.26
|
| 646 |
+
error 1536184.5
|
| 647 |
+
22 layer.1.DenseReluDense.wi_0
|
| 648 |
+
Quantizing ...
|
| 649 |
+
time 0.28
|
| 650 |
+
error 7638.18701171875
|
| 651 |
+
22 layer.1.DenseReluDense.wi_1
|
| 652 |
+
Quantizing ...
|
| 653 |
+
time 0.25
|
| 654 |
+
error 49872.0859375
|
| 655 |
+
22 layer.1.DenseReluDense.wo
|
| 656 |
+
Quantizing ...
|
| 657 |
+
time 0.69
|
| 658 |
+
error 4077312.5
|
| 659 |
+
23 layer.0.SelfAttention.q
|
| 660 |
+
Quantizing ...
|
| 661 |
+
time 0.40
|
| 662 |
+
error 53.174556732177734
|
| 663 |
+
23 layer.0.SelfAttention.k
|
| 664 |
+
Quantizing ...
|
| 665 |
+
time 0.26
|
| 666 |
+
error 2663.560302734375
|
| 667 |
+
23 layer.0.SelfAttention.v
|
| 668 |
+
Quantizing ...
|
| 669 |
+
time 0.27
|
| 670 |
+
error 35553.75
|
| 671 |
+
23 layer.0.SelfAttention.o
|
| 672 |
+
Quantizing ...
|
| 673 |
+
time 0.26
|
| 674 |
+
error 1983365.75
|
| 675 |
+
23 layer.1.DenseReluDense.wi_0
|
| 676 |
+
Quantizing ...
|
| 677 |
+
time 0.25
|
| 678 |
+
error 8208.654296875
|
| 679 |
+
23 layer.1.DenseReluDense.wi_1
|
| 680 |
+
Quantizing ...
|
| 681 |
+
time 0.25
|
| 682 |
+
error 51633.640625
|
| 683 |
+
23 layer.1.DenseReluDense.wo
|
| 684 |
+
Quantizing ...
|
| 685 |
+
time 0.69
|
| 686 |
+
error 8843078.0
|
| 687 |
+
114.8298749923706
|
| 688 |
+
Packing ...
|
| 689 |
+
encoder.block.0.layer.0.SelfAttention.q
|
| 690 |
+
encoder.block.0.layer.0.SelfAttention.k
|
| 691 |
+
encoder.block.0.layer.0.SelfAttention.v
|
| 692 |
+
encoder.block.0.layer.0.SelfAttention.o
|
| 693 |
+
encoder.block.0.layer.1.DenseReluDense.wi_0
|
| 694 |
+
encoder.block.0.layer.1.DenseReluDense.wi_1
|
| 695 |
+
encoder.block.0.layer.1.DenseReluDense.wo
|
| 696 |
+
encoder.block.1.layer.0.SelfAttention.q
|
| 697 |
+
encoder.block.1.layer.0.SelfAttention.k
|
| 698 |
+
encoder.block.1.layer.0.SelfAttention.v
|
| 699 |
+
encoder.block.1.layer.0.SelfAttention.o
|
| 700 |
+
encoder.block.1.layer.1.DenseReluDense.wi_0
|
| 701 |
+
encoder.block.1.layer.1.DenseReluDense.wi_1
|
| 702 |
+
encoder.block.1.layer.1.DenseReluDense.wo
|
| 703 |
+
encoder.block.2.layer.0.SelfAttention.q
|
| 704 |
+
encoder.block.2.layer.0.SelfAttention.k
|
| 705 |
+
encoder.block.2.layer.0.SelfAttention.v
|
| 706 |
+
encoder.block.2.layer.0.SelfAttention.o
|
| 707 |
+
encoder.block.2.layer.1.DenseReluDense.wi_0
|
| 708 |
+
encoder.block.2.layer.1.DenseReluDense.wi_1
|
| 709 |
+
encoder.block.2.layer.1.DenseReluDense.wo
|
| 710 |
+
encoder.block.3.layer.0.SelfAttention.q
|
| 711 |
+
encoder.block.3.layer.0.SelfAttention.k
|
| 712 |
+
encoder.block.3.layer.0.SelfAttention.v
|
| 713 |
+
encoder.block.3.layer.0.SelfAttention.o
|
| 714 |
+
encoder.block.3.layer.1.DenseReluDense.wi_0
|
| 715 |
+
encoder.block.3.layer.1.DenseReluDense.wi_1
|
| 716 |
+
encoder.block.3.layer.1.DenseReluDense.wo
|
| 717 |
+
encoder.block.4.layer.0.SelfAttention.q
|
| 718 |
+
encoder.block.4.layer.0.SelfAttention.k
|
| 719 |
+
encoder.block.4.layer.0.SelfAttention.v
|
| 720 |
+
encoder.block.4.layer.0.SelfAttention.o
|
| 721 |
+
encoder.block.4.layer.1.DenseReluDense.wi_0
|
| 722 |
+
encoder.block.4.layer.1.DenseReluDense.wi_1
|
| 723 |
+
encoder.block.4.layer.1.DenseReluDense.wo
|
| 724 |
+
encoder.block.5.layer.0.SelfAttention.q
|
| 725 |
+
encoder.block.5.layer.0.SelfAttention.k
|
| 726 |
+
encoder.block.5.layer.0.SelfAttention.v
|
| 727 |
+
encoder.block.5.layer.0.SelfAttention.o
|
| 728 |
+
encoder.block.5.layer.1.DenseReluDense.wi_0
|
| 729 |
+
encoder.block.5.layer.1.DenseReluDense.wi_1
|
| 730 |
+
encoder.block.5.layer.1.DenseReluDense.wo
|
| 731 |
+
encoder.block.6.layer.0.SelfAttention.q
|
| 732 |
+
encoder.block.6.layer.0.SelfAttention.k
|
| 733 |
+
encoder.block.6.layer.0.SelfAttention.v
|
| 734 |
+
encoder.block.6.layer.0.SelfAttention.o
|
| 735 |
+
encoder.block.6.layer.1.DenseReluDense.wi_0
|
| 736 |
+
encoder.block.6.layer.1.DenseReluDense.wi_1
|
| 737 |
+
encoder.block.6.layer.1.DenseReluDense.wo
|
| 738 |
+
encoder.block.7.layer.0.SelfAttention.q
|
| 739 |
+
encoder.block.7.layer.0.SelfAttention.k
|
| 740 |
+
encoder.block.7.layer.0.SelfAttention.v
|
| 741 |
+
encoder.block.7.layer.0.SelfAttention.o
|
| 742 |
+
encoder.block.7.layer.1.DenseReluDense.wi_0
|
| 743 |
+
encoder.block.7.layer.1.DenseReluDense.wi_1
|
| 744 |
+
encoder.block.7.layer.1.DenseReluDense.wo
|
| 745 |
+
encoder.block.8.layer.0.SelfAttention.q
|
| 746 |
+
encoder.block.8.layer.0.SelfAttention.k
|
| 747 |
+
encoder.block.8.layer.0.SelfAttention.v
|
| 748 |
+
encoder.block.8.layer.0.SelfAttention.o
|
| 749 |
+
encoder.block.8.layer.1.DenseReluDense.wi_0
|
| 750 |
+
encoder.block.8.layer.1.DenseReluDense.wi_1
|
| 751 |
+
encoder.block.8.layer.1.DenseReluDense.wo
|
| 752 |
+
encoder.block.9.layer.0.SelfAttention.q
|
| 753 |
+
encoder.block.9.layer.0.SelfAttention.k
|
| 754 |
+
encoder.block.9.layer.0.SelfAttention.v
|
| 755 |
+
encoder.block.9.layer.0.SelfAttention.o
|
| 756 |
+
encoder.block.9.layer.1.DenseReluDense.wi_0
|
| 757 |
+
encoder.block.9.layer.1.DenseReluDense.wi_1
|
| 758 |
+
encoder.block.9.layer.1.DenseReluDense.wo
|
| 759 |
+
encoder.block.10.layer.0.SelfAttention.q
|
| 760 |
+
encoder.block.10.layer.0.SelfAttention.k
|
| 761 |
+
encoder.block.10.layer.0.SelfAttention.v
|
| 762 |
+
encoder.block.10.layer.0.SelfAttention.o
|
| 763 |
+
encoder.block.10.layer.1.DenseReluDense.wi_0
|
| 764 |
+
encoder.block.10.layer.1.DenseReluDense.wi_1
|
| 765 |
+
encoder.block.10.layer.1.DenseReluDense.wo
|
| 766 |
+
encoder.block.11.layer.0.SelfAttention.q
|
| 767 |
+
encoder.block.11.layer.0.SelfAttention.k
|
| 768 |
+
encoder.block.11.layer.0.SelfAttention.v
|
| 769 |
+
encoder.block.11.layer.0.SelfAttention.o
|
| 770 |
+
encoder.block.11.layer.1.DenseReluDense.wi_0
|
| 771 |
+
encoder.block.11.layer.1.DenseReluDense.wi_1
|
| 772 |
+
encoder.block.11.layer.1.DenseReluDense.wo
|
| 773 |
+
encoder.block.12.layer.0.SelfAttention.q
|
| 774 |
+
encoder.block.12.layer.0.SelfAttention.k
|
| 775 |
+
encoder.block.12.layer.0.SelfAttention.v
|
| 776 |
+
encoder.block.12.layer.0.SelfAttention.o
|
| 777 |
+
encoder.block.12.layer.1.DenseReluDense.wi_0
|
| 778 |
+
encoder.block.12.layer.1.DenseReluDense.wi_1
|
| 779 |
+
encoder.block.12.layer.1.DenseReluDense.wo
|
| 780 |
+
encoder.block.13.layer.0.SelfAttention.q
|
| 781 |
+
encoder.block.13.layer.0.SelfAttention.k
|
| 782 |
+
encoder.block.13.layer.0.SelfAttention.v
|
| 783 |
+
encoder.block.13.layer.0.SelfAttention.o
|
| 784 |
+
encoder.block.13.layer.1.DenseReluDense.wi_0
|
| 785 |
+
encoder.block.13.layer.1.DenseReluDense.wi_1
|
| 786 |
+
encoder.block.13.layer.1.DenseReluDense.wo
|
| 787 |
+
encoder.block.14.layer.0.SelfAttention.q
|
| 788 |
+
encoder.block.14.layer.0.SelfAttention.k
|
| 789 |
+
encoder.block.14.layer.0.SelfAttention.v
|
| 790 |
+
encoder.block.14.layer.0.SelfAttention.o
|
| 791 |
+
encoder.block.14.layer.1.DenseReluDense.wi_0
|
| 792 |
+
encoder.block.14.layer.1.DenseReluDense.wi_1
|
| 793 |
+
encoder.block.14.layer.1.DenseReluDense.wo
|
| 794 |
+
encoder.block.15.layer.0.SelfAttention.q
|
| 795 |
+
encoder.block.15.layer.0.SelfAttention.k
|
| 796 |
+
encoder.block.15.layer.0.SelfAttention.v
|
| 797 |
+
encoder.block.15.layer.0.SelfAttention.o
|
| 798 |
+
encoder.block.15.layer.1.DenseReluDense.wi_0
|
| 799 |
+
encoder.block.15.layer.1.DenseReluDense.wi_1
|
| 800 |
+
encoder.block.15.layer.1.DenseReluDense.wo
|
| 801 |
+
encoder.block.16.layer.0.SelfAttention.q
|
| 802 |
+
encoder.block.16.layer.0.SelfAttention.k
|
| 803 |
+
encoder.block.16.layer.0.SelfAttention.v
|
| 804 |
+
encoder.block.16.layer.0.SelfAttention.o
|
| 805 |
+
encoder.block.16.layer.1.DenseReluDense.wi_0
|
| 806 |
+
encoder.block.16.layer.1.DenseReluDense.wi_1
|
| 807 |
+
encoder.block.16.layer.1.DenseReluDense.wo
|
| 808 |
+
encoder.block.17.layer.0.SelfAttention.q
|
| 809 |
+
encoder.block.17.layer.0.SelfAttention.k
|
| 810 |
+
encoder.block.17.layer.0.SelfAttention.v
|
| 811 |
+
encoder.block.17.layer.0.SelfAttention.o
|
| 812 |
+
encoder.block.17.layer.1.DenseReluDense.wi_0
|
| 813 |
+
encoder.block.17.layer.1.DenseReluDense.wi_1
|
| 814 |
+
encoder.block.17.layer.1.DenseReluDense.wo
|
| 815 |
+
encoder.block.18.layer.0.SelfAttention.q
|
| 816 |
+
encoder.block.18.layer.0.SelfAttention.k
|
| 817 |
+
encoder.block.18.layer.0.SelfAttention.v
|
| 818 |
+
encoder.block.18.layer.0.SelfAttention.o
|
| 819 |
+
encoder.block.18.layer.1.DenseReluDense.wi_0
|
| 820 |
+
encoder.block.18.layer.1.DenseReluDense.wi_1
|
| 821 |
+
encoder.block.18.layer.1.DenseReluDense.wo
|
| 822 |
+
encoder.block.19.layer.0.SelfAttention.q
|
| 823 |
+
encoder.block.19.layer.0.SelfAttention.k
|
| 824 |
+
encoder.block.19.layer.0.SelfAttention.v
|
| 825 |
+
encoder.block.19.layer.0.SelfAttention.o
|
| 826 |
+
encoder.block.19.layer.1.DenseReluDense.wi_0
|
| 827 |
+
encoder.block.19.layer.1.DenseReluDense.wi_1
|
| 828 |
+
encoder.block.19.layer.1.DenseReluDense.wo
|
| 829 |
+
encoder.block.20.layer.0.SelfAttention.q
|
| 830 |
+
encoder.block.20.layer.0.SelfAttention.k
|
| 831 |
+
encoder.block.20.layer.0.SelfAttention.v
|
| 832 |
+
encoder.block.20.layer.0.SelfAttention.o
|
| 833 |
+
encoder.block.20.layer.1.DenseReluDense.wi_0
|
| 834 |
+
encoder.block.20.layer.1.DenseReluDense.wi_1
|
| 835 |
+
encoder.block.20.layer.1.DenseReluDense.wo
|
| 836 |
+
encoder.block.21.layer.0.SelfAttention.q
|
| 837 |
+
encoder.block.21.layer.0.SelfAttention.k
|
| 838 |
+
encoder.block.21.layer.0.SelfAttention.v
|
| 839 |
+
encoder.block.21.layer.0.SelfAttention.o
|
| 840 |
+
encoder.block.21.layer.1.DenseReluDense.wi_0
|
| 841 |
+
encoder.block.21.layer.1.DenseReluDense.wi_1
|
| 842 |
+
encoder.block.21.layer.1.DenseReluDense.wo
|
| 843 |
+
encoder.block.22.layer.0.SelfAttention.q
|
| 844 |
+
encoder.block.22.layer.0.SelfAttention.k
|
| 845 |
+
encoder.block.22.layer.0.SelfAttention.v
|
| 846 |
+
encoder.block.22.layer.0.SelfAttention.o
|
| 847 |
+
encoder.block.22.layer.1.DenseReluDense.wi_0
|
| 848 |
+
encoder.block.22.layer.1.DenseReluDense.wi_1
|
| 849 |
+
encoder.block.22.layer.1.DenseReluDense.wo
|
| 850 |
+
encoder.block.23.layer.0.SelfAttention.q
|
| 851 |
+
encoder.block.23.layer.0.SelfAttention.k
|
| 852 |
+
encoder.block.23.layer.0.SelfAttention.v
|
| 853 |
+
encoder.block.23.layer.0.SelfAttention.o
|
| 854 |
+
encoder.block.23.layer.1.DenseReluDense.wi_0
|
| 855 |
+
encoder.block.23.layer.1.DenseReluDense.wi_1
|
| 856 |
+
encoder.block.23.layer.1.DenseReluDense.wo
|
| 857 |
+
Done.
|
workspace/flan-ts-small.txt
ADDED
|
@@ -0,0 +1,298 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CUDA extension not installed.
|
| 2 |
+
Downloading (��)lve/main/config.json: 100%|��| 1.40k/1.40k [00:00<00:00, 3.49MB/s]
|
| 3 |
+
Downloading pytorch_model.bin: 100%|������������������| 308M/308M [00:03<00:00, 88.9MB/s]
|
| 4 |
+
Some weights of the model checkpoint at google/flan-t5-small were not used when initializing T5EncoderModel: ['decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.embed_tokens.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.final_layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'lm_head.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight']
|
| 5 |
+
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
| 6 |
+
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
| 7 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 8 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 9 |
+
Downloading (��)okenizer_config.json: 100%|��| 2.54k/2.54k [00:00<00:00, 8.90MB/s]
|
| 10 |
+
Downloading spiece.model: 100%|������������������������������| 792k/792k [00:00<00:00, 122MB/s]
|
| 11 |
+
Downloading (��)cial_tokens_map.json: 100%|��| 2.20k/2.20k [00:00<00:00, 8.32MB/s]
|
| 12 |
+
Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
|
| 13 |
+
Starting ...
|
| 14 |
+
Ready.
|
| 15 |
+
0 layer.0.SelfAttention.q
|
| 16 |
+
Quantizing ...
|
| 17 |
+
time 0.70
|
| 18 |
+
error 84.84163665771484
|
| 19 |
+
0 layer.0.SelfAttention.k
|
| 20 |
+
Quantizing ...
|
| 21 |
+
time 0.13
|
| 22 |
+
error 4683.52685546875
|
| 23 |
+
0 layer.0.SelfAttention.v
|
| 24 |
+
Quantizing ...
|
| 25 |
+
time 0.13
|
| 26 |
+
error 1787.6051025390625
|
| 27 |
+
0 layer.0.SelfAttention.o
|
| 28 |
+
Quantizing ...
|
| 29 |
+
time 0.10
|
| 30 |
+
error 137924.640625
|
| 31 |
+
0 layer.1.DenseReluDense.wi_0
|
| 32 |
+
Quantizing ...
|
| 33 |
+
time 0.13
|
| 34 |
+
error 9668.408203125
|
| 35 |
+
0 layer.1.DenseReluDense.wi_1
|
| 36 |
+
Quantizing ...
|
| 37 |
+
time 0.13
|
| 38 |
+
error 12095.4453125
|
| 39 |
+
0 layer.1.DenseReluDense.wo
|
| 40 |
+
Quantizing ...
|
| 41 |
+
time 0.26
|
| 42 |
+
error 592524.75
|
| 43 |
+
1 layer.0.SelfAttention.q
|
| 44 |
+
Quantizing ...
|
| 45 |
+
time 0.17
|
| 46 |
+
error 76.55366516113281
|
| 47 |
+
1 layer.0.SelfAttention.k
|
| 48 |
+
Quantizing ...
|
| 49 |
+
time 0.13
|
| 50 |
+
error 4797.10107421875
|
| 51 |
+
1 layer.0.SelfAttention.v
|
| 52 |
+
Quantizing ...
|
| 53 |
+
time 0.13
|
| 54 |
+
error 3586.5419921875
|
| 55 |
+
1 layer.0.SelfAttention.o
|
| 56 |
+
Quantizing ...
|
| 57 |
+
time 0.09
|
| 58 |
+
error 30098.28515625
|
| 59 |
+
1 layer.1.DenseReluDense.wi_0
|
| 60 |
+
Quantizing ...
|
| 61 |
+
time 0.13
|
| 62 |
+
error 7313.28759765625
|
| 63 |
+
1 layer.1.DenseReluDense.wi_1
|
| 64 |
+
Quantizing ...
|
| 65 |
+
time 0.13
|
| 66 |
+
error 11631.021484375
|
| 67 |
+
1 layer.1.DenseReluDense.wo
|
| 68 |
+
Quantizing ...
|
| 69 |
+
time 0.25
|
| 70 |
+
error 3476349.0
|
| 71 |
+
2 layer.0.SelfAttention.q
|
| 72 |
+
Quantizing ...
|
| 73 |
+
time 0.17
|
| 74 |
+
error 41.201637268066406
|
| 75 |
+
2 layer.0.SelfAttention.k
|
| 76 |
+
Quantizing ...
|
| 77 |
+
time 0.13
|
| 78 |
+
error 2614.22265625
|
| 79 |
+
2 layer.0.SelfAttention.v
|
| 80 |
+
Quantizing ...
|
| 81 |
+
time 0.13
|
| 82 |
+
error 4339.2080078125
|
| 83 |
+
2 layer.0.SelfAttention.o
|
| 84 |
+
Quantizing ...
|
| 85 |
+
time 0.10
|
| 86 |
+
error 42485.1328125
|
| 87 |
+
2 layer.1.DenseReluDense.wi_0
|
| 88 |
+
Quantizing ...
|
| 89 |
+
time 0.13
|
| 90 |
+
error 5012.45947265625
|
| 91 |
+
2 layer.1.DenseReluDense.wi_1
|
| 92 |
+
Quantizing ...
|
| 93 |
+
time 0.13
|
| 94 |
+
error 16528.76953125
|
| 95 |
+
2 layer.1.DenseReluDense.wo
|
| 96 |
+
Quantizing ...
|
| 97 |
+
time 0.26
|
| 98 |
+
error 192300448.0
|
| 99 |
+
3 layer.0.SelfAttention.q
|
| 100 |
+
Quantizing ...
|
| 101 |
+
time 0.17
|
| 102 |
+
error 53.63971710205078
|
| 103 |
+
3 layer.0.SelfAttention.k
|
| 104 |
+
Quantizing ...
|
| 105 |
+
time 0.13
|
| 106 |
+
error 3402.79736328125
|
| 107 |
+
3 layer.0.SelfAttention.v
|
| 108 |
+
Quantizing ...
|
| 109 |
+
time 0.13
|
| 110 |
+
error 8263.0869140625
|
| 111 |
+
3 layer.0.SelfAttention.o
|
| 112 |
+
Quantizing ...
|
| 113 |
+
time 0.10
|
| 114 |
+
error 111050.171875
|
| 115 |
+
3 layer.1.DenseReluDense.wi_0
|
| 116 |
+
Quantizing ...
|
| 117 |
+
time 0.13
|
| 118 |
+
error 3236.92529296875
|
| 119 |
+
3 layer.1.DenseReluDense.wi_1
|
| 120 |
+
Quantizing ...
|
| 121 |
+
time 0.13
|
| 122 |
+
error 17445.189453125
|
| 123 |
+
3 layer.1.DenseReluDense.wo
|
| 124 |
+
Quantizing ...
|
| 125 |
+
time 0.25
|
| 126 |
+
error 700423.0
|
| 127 |
+
4 layer.0.SelfAttention.q
|
| 128 |
+
Quantizing ...
|
| 129 |
+
time 0.17
|
| 130 |
+
error 38.29411315917969
|
| 131 |
+
4 layer.0.SelfAttention.k
|
| 132 |
+
Quantizing ...
|
| 133 |
+
time 0.13
|
| 134 |
+
error 2450.30517578125
|
| 135 |
+
4 layer.0.SelfAttention.v
|
| 136 |
+
Quantizing ...
|
| 137 |
+
time 0.13
|
| 138 |
+
error 11326.40625
|
| 139 |
+
4 layer.0.SelfAttention.o
|
| 140 |
+
Quantizing ...
|
| 141 |
+
time 0.10
|
| 142 |
+
error 64683.59375
|
| 143 |
+
4 layer.1.DenseReluDense.wi_0
|
| 144 |
+
Quantizing ...
|
| 145 |
+
time 0.13
|
| 146 |
+
error 2528.781494140625
|
| 147 |
+
4 layer.1.DenseReluDense.wi_1
|
| 148 |
+
Quantizing ...
|
| 149 |
+
time 0.13
|
| 150 |
+
error 18873.064453125
|
| 151 |
+
4 layer.1.DenseReluDense.wo
|
| 152 |
+
Quantizing ...
|
| 153 |
+
time 0.25
|
| 154 |
+
error 760352.25
|
| 155 |
+
5 layer.0.SelfAttention.q
|
| 156 |
+
Quantizing ...
|
| 157 |
+
time 0.17
|
| 158 |
+
error 37.40803527832031
|
| 159 |
+
5 layer.0.SelfAttention.k
|
| 160 |
+
Quantizing ...
|
| 161 |
+
time 0.13
|
| 162 |
+
error 2389.12841796875
|
| 163 |
+
5 layer.0.SelfAttention.v
|
| 164 |
+
Quantizing ...
|
| 165 |
+
time 0.13
|
| 166 |
+
error 10107.05078125
|
| 167 |
+
5 layer.0.SelfAttention.o
|
| 168 |
+
Quantizing ...
|
| 169 |
+
time 0.10
|
| 170 |
+
error 216297.78125
|
| 171 |
+
5 layer.1.DenseReluDense.wi_0
|
| 172 |
+
Quantizing ...
|
| 173 |
+
time 0.13
|
| 174 |
+
error 2324.2021484375
|
| 175 |
+
5 layer.1.DenseReluDense.wi_1
|
| 176 |
+
Quantizing ...
|
| 177 |
+
time 0.13
|
| 178 |
+
error 23206.798828125
|
| 179 |
+
5 layer.1.DenseReluDense.wo
|
| 180 |
+
Quantizing ...
|
| 181 |
+
time 0.26
|
| 182 |
+
error 960373.375
|
| 183 |
+
6 layer.0.SelfAttention.q
|
| 184 |
+
Quantizing ...
|
| 185 |
+
time 0.18
|
| 186 |
+
error 27.358470916748047
|
| 187 |
+
6 layer.0.SelfAttention.k
|
| 188 |
+
Quantizing ...
|
| 189 |
+
time 0.13
|
| 190 |
+
error 1652.122802734375
|
| 191 |
+
6 layer.0.SelfAttention.v
|
| 192 |
+
Quantizing ...
|
| 193 |
+
time 0.13
|
| 194 |
+
error 11492.5712890625
|
| 195 |
+
6 layer.0.SelfAttention.o
|
| 196 |
+
Quantizing ...
|
| 197 |
+
time 0.10
|
| 198 |
+
error 327756.75
|
| 199 |
+
6 layer.1.DenseReluDense.wi_0
|
| 200 |
+
Quantizing ...
|
| 201 |
+
time 0.13
|
| 202 |
+
error 2362.47998046875
|
| 203 |
+
6 layer.1.DenseReluDense.wi_1
|
| 204 |
+
Quantizing ...
|
| 205 |
+
time 0.15
|
| 206 |
+
error 33793.09765625
|
| 207 |
+
6 layer.1.DenseReluDense.wo
|
| 208 |
+
Quantizing ...
|
| 209 |
+
time 0.27
|
| 210 |
+
error 7250225.0
|
| 211 |
+
7 layer.0.SelfAttention.q
|
| 212 |
+
Quantizing ...
|
| 213 |
+
time 0.18
|
| 214 |
+
error 31.67843246459961
|
| 215 |
+
7 layer.0.SelfAttention.k
|
| 216 |
+
Quantizing ...
|
| 217 |
+
time 0.13
|
| 218 |
+
error 1604.3997802734375
|
| 219 |
+
7 layer.0.SelfAttention.v
|
| 220 |
+
Quantizing ...
|
| 221 |
+
time 0.13
|
| 222 |
+
error 19231.8984375
|
| 223 |
+
7 layer.0.SelfAttention.o
|
| 224 |
+
Quantizing ...
|
| 225 |
+
time 0.10
|
| 226 |
+
error 493063.46875
|
| 227 |
+
7 layer.1.DenseReluDense.wi_0
|
| 228 |
+
Quantizing ...
|
| 229 |
+
time 0.14
|
| 230 |
+
error 2606.19873046875
|
| 231 |
+
7 layer.1.DenseReluDense.wi_1
|
| 232 |
+
Quantizing ...
|
| 233 |
+
time 0.14
|
| 234 |
+
error 55759.1640625
|
| 235 |
+
7 layer.1.DenseReluDense.wo
|
| 236 |
+
Quantizing ...
|
| 237 |
+
time 0.26
|
| 238 |
+
error 39936240.0
|
| 239 |
+
16.350690126419067
|
| 240 |
+
Packing ...
|
| 241 |
+
encoder.block.0.layer.0.SelfAttention.q
|
| 242 |
+
encoder.block.0.layer.0.SelfAttention.k
|
| 243 |
+
encoder.block.0.layer.0.SelfAttention.v
|
| 244 |
+
encoder.block.0.layer.0.SelfAttention.o
|
| 245 |
+
encoder.block.0.layer.1.DenseReluDense.wi_0
|
| 246 |
+
encoder.block.0.layer.1.DenseReluDense.wi_1
|
| 247 |
+
encoder.block.0.layer.1.DenseReluDense.wo
|
| 248 |
+
encoder.block.1.layer.0.SelfAttention.q
|
| 249 |
+
encoder.block.1.layer.0.SelfAttention.k
|
| 250 |
+
encoder.block.1.layer.0.SelfAttention.v
|
| 251 |
+
encoder.block.1.layer.0.SelfAttention.o
|
| 252 |
+
encoder.block.1.layer.1.DenseReluDense.wi_0
|
| 253 |
+
encoder.block.1.layer.1.DenseReluDense.wi_1
|
| 254 |
+
encoder.block.1.layer.1.DenseReluDense.wo
|
| 255 |
+
encoder.block.2.layer.0.SelfAttention.q
|
| 256 |
+
encoder.block.2.layer.0.SelfAttention.k
|
| 257 |
+
encoder.block.2.layer.0.SelfAttention.v
|
| 258 |
+
encoder.block.2.layer.0.SelfAttention.o
|
| 259 |
+
encoder.block.2.layer.1.DenseReluDense.wi_0
|
| 260 |
+
encoder.block.2.layer.1.DenseReluDense.wi_1
|
| 261 |
+
encoder.block.2.layer.1.DenseReluDense.wo
|
| 262 |
+
encoder.block.3.layer.0.SelfAttention.q
|
| 263 |
+
encoder.block.3.layer.0.SelfAttention.k
|
| 264 |
+
encoder.block.3.layer.0.SelfAttention.v
|
| 265 |
+
encoder.block.3.layer.0.SelfAttention.o
|
| 266 |
+
encoder.block.3.layer.1.DenseReluDense.wi_0
|
| 267 |
+
encoder.block.3.layer.1.DenseReluDense.wi_1
|
| 268 |
+
encoder.block.3.layer.1.DenseReluDense.wo
|
| 269 |
+
encoder.block.4.layer.0.SelfAttention.q
|
| 270 |
+
encoder.block.4.layer.0.SelfAttention.k
|
| 271 |
+
encoder.block.4.layer.0.SelfAttention.v
|
| 272 |
+
encoder.block.4.layer.0.SelfAttention.o
|
| 273 |
+
encoder.block.4.layer.1.DenseReluDense.wi_0
|
| 274 |
+
encoder.block.4.layer.1.DenseReluDense.wi_1
|
| 275 |
+
encoder.block.4.layer.1.DenseReluDense.wo
|
| 276 |
+
encoder.block.5.layer.0.SelfAttention.q
|
| 277 |
+
encoder.block.5.layer.0.SelfAttention.k
|
| 278 |
+
encoder.block.5.layer.0.SelfAttention.v
|
| 279 |
+
encoder.block.5.layer.0.SelfAttention.o
|
| 280 |
+
encoder.block.5.layer.1.DenseReluDense.wi_0
|
| 281 |
+
encoder.block.5.layer.1.DenseReluDense.wi_1
|
| 282 |
+
encoder.block.5.layer.1.DenseReluDense.wo
|
| 283 |
+
encoder.block.6.layer.0.SelfAttention.q
|
| 284 |
+
encoder.block.6.layer.0.SelfAttention.k
|
| 285 |
+
encoder.block.6.layer.0.SelfAttention.v
|
| 286 |
+
encoder.block.6.layer.0.SelfAttention.o
|
| 287 |
+
encoder.block.6.layer.1.DenseReluDense.wi_0
|
| 288 |
+
encoder.block.6.layer.1.DenseReluDense.wi_1
|
| 289 |
+
encoder.block.6.layer.1.DenseReluDense.wo
|
| 290 |
+
encoder.block.7.layer.0.SelfAttention.q
|
| 291 |
+
encoder.block.7.layer.0.SelfAttention.k
|
| 292 |
+
encoder.block.7.layer.0.SelfAttention.v
|
| 293 |
+
encoder.block.7.layer.0.SelfAttention.o
|
| 294 |
+
encoder.block.7.layer.1.DenseReluDense.wi_0
|
| 295 |
+
encoder.block.7.layer.1.DenseReluDense.wi_1
|
| 296 |
+
encoder.block.7.layer.1.DenseReluDense.wo
|
| 297 |
+
Done.
|
| 298 |
+
|
workspace/flan-ts-xl.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
workspace/flan-ts-xxl.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
workspace/ts_large.txt
ADDED
|
@@ -0,0 +1,743 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CUDA extension not installed.
|
| 2 |
+
Some weights of the model checkpoint at t5-large were not used when initializing T5EncoderModel: ['decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.2.DenseReluDense.wi.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.13.layer.2.DenseReluDense.wi.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.17.layer.2.DenseReluDense.wi.weight', 'decoder.block.12.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.3.layer.2.DenseReluDense.wi.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.2.DenseReluDense.wi.weight', 'decoder.block.14.layer.2.DenseReluDense.wi.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wi.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.2.DenseReluDense.wi.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.final_layer_norm.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.9.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.2.DenseReluDense.wi.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wi.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.21.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.20.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.2.DenseReluDense.wi.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.19.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.4.layer.2.DenseReluDense.wi.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.18.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.2.DenseReluDense.wi.weight']
|
| 3 |
+
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
| 4 |
+
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
| 5 |
+
Downloading and preparing dataset wikitext/wikitext-2-raw-v1 to /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126...
|
| 6 |
+
Downloading data: 100%|����������������������������������������| 4.72M/4.72M [00:03<00:00, 1.38MB/s]
|
| 7 |
+
Dataset wikitext downloaded and prepared to /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126. Subsequent calls will reuse this data.
|
| 8 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 9 |
+
Downloading (��)lve/main/config.json: 100%|��| 1.21k/1.21k [00:00<00:00, 4.35MB/s]
|
| 10 |
+
Downloading (��)ve/main/spiece.model: 100%|������| 792k/792k [00:00<00:00, 2.03MB/s]
|
| 11 |
+
Downloading (��)/main/tokenizer.json: 100%|��| 1.39M/1.39M [00:00<00:00, 10.3MB/s]
|
| 12 |
+
/usr/local/lib/python3.10/dist-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
|
| 13 |
+
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
|
| 14 |
+
- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
|
| 15 |
+
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
|
| 16 |
+
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
|
| 17 |
+
warnings.warn(
|
| 18 |
+
Token indices sequence length is longer than the specified maximum sequence length for this model (2837091 > 512). Running this sequence through the model will result in indexing errors
|
| 19 |
+
Starting ...
|
| 20 |
+
Ready.
|
| 21 |
+
0 layer.0.SelfAttention.q
|
| 22 |
+
Quantizing ...
|
| 23 |
+
time 1.05
|
| 24 |
+
error 230.94436645507812
|
| 25 |
+
0 layer.0.SelfAttention.k
|
| 26 |
+
Quantizing ...
|
| 27 |
+
time 0.35
|
| 28 |
+
error 15016.095703125
|
| 29 |
+
0 layer.0.SelfAttention.v
|
| 30 |
+
Quantizing ...
|
| 31 |
+
time 0.38
|
| 32 |
+
error 9677.2041015625
|
| 33 |
+
0 layer.0.SelfAttention.o
|
| 34 |
+
Quantizing ...
|
| 35 |
+
time 0.35
|
| 36 |
+
error 106466.890625
|
| 37 |
+
0 layer.1.DenseReluDense.wi
|
| 38 |
+
Quantizing ...
|
| 39 |
+
time 0.38
|
| 40 |
+
error 216545.046875
|
| 41 |
+
0 layer.1.DenseReluDense.wo
|
| 42 |
+
Quantizing ...
|
| 43 |
+
time 1.43
|
| 44 |
+
error 175480.0
|
| 45 |
+
1 layer.0.SelfAttention.q
|
| 46 |
+
Quantizing ...
|
| 47 |
+
time 0.54
|
| 48 |
+
error 212.295166015625
|
| 49 |
+
1 layer.0.SelfAttention.k
|
| 50 |
+
Quantizing ...
|
| 51 |
+
time 0.37
|
| 52 |
+
error 11788.3134765625
|
| 53 |
+
1 layer.0.SelfAttention.v
|
| 54 |
+
Quantizing ...
|
| 55 |
+
time 0.35
|
| 56 |
+
error 10337.71484375
|
| 57 |
+
1 layer.0.SelfAttention.o
|
| 58 |
+
Quantizing ...
|
| 59 |
+
time 0.35
|
| 60 |
+
error 78876.84375
|
| 61 |
+
1 layer.1.DenseReluDense.wi
|
| 62 |
+
Quantizing ...
|
| 63 |
+
time 0.35
|
| 64 |
+
error 362692.28125
|
| 65 |
+
1 layer.1.DenseReluDense.wo
|
| 66 |
+
Quantizing ...
|
| 67 |
+
time 1.42
|
| 68 |
+
error 330811.875
|
| 69 |
+
2 layer.0.SelfAttention.q
|
| 70 |
+
Quantizing ...
|
| 71 |
+
time 0.53
|
| 72 |
+
error 149.39337158203125
|
| 73 |
+
2 layer.0.SelfAttention.k
|
| 74 |
+
Quantizing ...
|
| 75 |
+
time 0.35
|
| 76 |
+
error 8281.7451171875
|
| 77 |
+
2 layer.0.SelfAttention.v
|
| 78 |
+
Quantizing ...
|
| 79 |
+
time 0.35
|
| 80 |
+
error 9236.6171875
|
| 81 |
+
2 layer.0.SelfAttention.o
|
| 82 |
+
Quantizing ...
|
| 83 |
+
time 0.35
|
| 84 |
+
error 25642.55859375
|
| 85 |
+
2 layer.1.DenseReluDense.wi
|
| 86 |
+
Quantizing ...
|
| 87 |
+
time 0.35
|
| 88 |
+
error 635081.875
|
| 89 |
+
2 layer.1.DenseReluDense.wo
|
| 90 |
+
Quantizing ...
|
| 91 |
+
time 1.42
|
| 92 |
+
error 362131.0
|
| 93 |
+
3 layer.0.SelfAttention.q
|
| 94 |
+
Quantizing ...
|
| 95 |
+
time 0.53
|
| 96 |
+
error 198.08612060546875
|
| 97 |
+
3 layer.0.SelfAttention.k
|
| 98 |
+
Quantizing ...
|
| 99 |
+
time 0.37
|
| 100 |
+
error 10755.650390625
|
| 101 |
+
3 layer.0.SelfAttention.v
|
| 102 |
+
Quantizing ...
|
| 103 |
+
time 0.35
|
| 104 |
+
error 9889.1962890625
|
| 105 |
+
3 layer.0.SelfAttention.o
|
| 106 |
+
Quantizing ...
|
| 107 |
+
time 0.38
|
| 108 |
+
error 37326.6640625
|
| 109 |
+
3 layer.1.DenseReluDense.wi
|
| 110 |
+
Quantizing ...
|
| 111 |
+
time 0.35
|
| 112 |
+
error 1070184.5
|
| 113 |
+
3 layer.1.DenseReluDense.wo
|
| 114 |
+
Quantizing ...
|
| 115 |
+
time 1.47
|
| 116 |
+
error 399097.0625
|
| 117 |
+
4 layer.0.SelfAttention.q
|
| 118 |
+
Quantizing ...
|
| 119 |
+
time 0.55
|
| 120 |
+
error 232.6760711669922
|
| 121 |
+
4 layer.0.SelfAttention.k
|
| 122 |
+
Quantizing ...
|
| 123 |
+
time 0.38
|
| 124 |
+
error 12199.326171875
|
| 125 |
+
4 layer.0.SelfAttention.v
|
| 126 |
+
Quantizing ...
|
| 127 |
+
time 0.36
|
| 128 |
+
error 11181.3046875
|
| 129 |
+
4 layer.0.SelfAttention.o
|
| 130 |
+
Quantizing ...
|
| 131 |
+
time 0.35
|
| 132 |
+
error 55337.78125
|
| 133 |
+
4 layer.1.DenseReluDense.wi
|
| 134 |
+
Quantizing ...
|
| 135 |
+
time 0.35
|
| 136 |
+
error 1705248.125
|
| 137 |
+
4 layer.1.DenseReluDense.wo
|
| 138 |
+
Quantizing ...
|
| 139 |
+
time 1.42
|
| 140 |
+
error 368282.875
|
| 141 |
+
5 layer.0.SelfAttention.q
|
| 142 |
+
Quantizing ...
|
| 143 |
+
time 0.53
|
| 144 |
+
error 218.76162719726562
|
| 145 |
+
5 layer.0.SelfAttention.k
|
| 146 |
+
Quantizing ...
|
| 147 |
+
time 0.35
|
| 148 |
+
error 12070.9462890625
|
| 149 |
+
5 layer.0.SelfAttention.v
|
| 150 |
+
Quantizing ...
|
| 151 |
+
time 0.35
|
| 152 |
+
error 13040.486328125
|
| 153 |
+
5 layer.0.SelfAttention.o
|
| 154 |
+
Quantizing ...
|
| 155 |
+
time 0.35
|
| 156 |
+
error 77213.7109375
|
| 157 |
+
5 layer.1.DenseReluDense.wi
|
| 158 |
+
Quantizing ...
|
| 159 |
+
time 0.35
|
| 160 |
+
error 2338288.5
|
| 161 |
+
5 layer.1.DenseReluDense.wo
|
| 162 |
+
Quantizing ...
|
| 163 |
+
time 1.41
|
| 164 |
+
error 324492.5625
|
| 165 |
+
6 layer.0.SelfAttention.q
|
| 166 |
+
Quantizing ...
|
| 167 |
+
time 0.53
|
| 168 |
+
error 212.82241821289062
|
| 169 |
+
6 layer.0.SelfAttention.k
|
| 170 |
+
Quantizing ...
|
| 171 |
+
time 0.37
|
| 172 |
+
error 11916.390625
|
| 173 |
+
6 layer.0.SelfAttention.v
|
| 174 |
+
Quantizing ...
|
| 175 |
+
time 0.35
|
| 176 |
+
error 13278.4794921875
|
| 177 |
+
6 layer.0.SelfAttention.o
|
| 178 |
+
Quantizing ...
|
| 179 |
+
time 0.38
|
| 180 |
+
error 93256.609375
|
| 181 |
+
6 layer.1.DenseReluDense.wi
|
| 182 |
+
Quantizing ...
|
| 183 |
+
time 0.35
|
| 184 |
+
error 2914808.0
|
| 185 |
+
6 layer.1.DenseReluDense.wo
|
| 186 |
+
Quantizing ...
|
| 187 |
+
time 1.47
|
| 188 |
+
error 326483.75
|
| 189 |
+
7 layer.0.SelfAttention.q
|
| 190 |
+
Quantizing ...
|
| 191 |
+
time 0.56
|
| 192 |
+
error 196.81045532226562
|
| 193 |
+
7 layer.0.SelfAttention.k
|
| 194 |
+
Quantizing ...
|
| 195 |
+
time 0.36
|
| 196 |
+
error 11539.515625
|
| 197 |
+
7 layer.0.SelfAttention.v
|
| 198 |
+
Quantizing ...
|
| 199 |
+
time 0.37
|
| 200 |
+
error 14094.767578125
|
| 201 |
+
7 layer.0.SelfAttention.o
|
| 202 |
+
Quantizing ...
|
| 203 |
+
time 0.35
|
| 204 |
+
error 67957.1171875
|
| 205 |
+
7 layer.1.DenseReluDense.wi
|
| 206 |
+
Quantizing ...
|
| 207 |
+
time 0.35
|
| 208 |
+
error 2997633.0
|
| 209 |
+
7 layer.1.DenseReluDense.wo
|
| 210 |
+
Quantizing ...
|
| 211 |
+
time 1.42
|
| 212 |
+
error 415390.4375
|
| 213 |
+
8 layer.0.SelfAttention.q
|
| 214 |
+
Quantizing ...
|
| 215 |
+
time 0.53
|
| 216 |
+
error 204.32620239257812
|
| 217 |
+
8 layer.0.SelfAttention.k
|
| 218 |
+
Quantizing ...
|
| 219 |
+
time 0.35
|
| 220 |
+
error 12758.60546875
|
| 221 |
+
8 layer.0.SelfAttention.v
|
| 222 |
+
Quantizing ...
|
| 223 |
+
time 0.35
|
| 224 |
+
error 20335.3203125
|
| 225 |
+
8 layer.0.SelfAttention.o
|
| 226 |
+
Quantizing ...
|
| 227 |
+
time 0.35
|
| 228 |
+
error 242356.3125
|
| 229 |
+
8 layer.1.DenseReluDense.wi
|
| 230 |
+
Quantizing ...
|
| 231 |
+
time 0.35
|
| 232 |
+
error 3908813.5
|
| 233 |
+
8 layer.1.DenseReluDense.wo
|
| 234 |
+
Quantizing ...
|
| 235 |
+
time 1.43
|
| 236 |
+
error 657590.75
|
| 237 |
+
9 layer.0.SelfAttention.q
|
| 238 |
+
Quantizing ...
|
| 239 |
+
time 0.53
|
| 240 |
+
error 165.850830078125
|
| 241 |
+
9 layer.0.SelfAttention.k
|
| 242 |
+
Quantizing ...
|
| 243 |
+
time 0.36
|
| 244 |
+
error 11305.962890625
|
| 245 |
+
9 layer.0.SelfAttention.v
|
| 246 |
+
Quantizing ...
|
| 247 |
+
time 0.36
|
| 248 |
+
error 20146.72265625
|
| 249 |
+
9 layer.0.SelfAttention.o
|
| 250 |
+
Quantizing ...
|
| 251 |
+
time 0.36
|
| 252 |
+
error 155148.46875
|
| 253 |
+
9 layer.1.DenseReluDense.wi
|
| 254 |
+
Quantizing ...
|
| 255 |
+
time 0.36
|
| 256 |
+
error 4378728.5
|
| 257 |
+
9 layer.1.DenseReluDense.wo
|
| 258 |
+
Quantizing ...
|
| 259 |
+
time 1.47
|
| 260 |
+
error 785346.5625
|
| 261 |
+
10 layer.0.SelfAttention.q
|
| 262 |
+
Quantizing ...
|
| 263 |
+
time 0.55
|
| 264 |
+
error 150.81277465820312
|
| 265 |
+
10 layer.0.SelfAttention.k
|
| 266 |
+
Quantizing ...
|
| 267 |
+
time 0.35
|
| 268 |
+
error 8967.4853515625
|
| 269 |
+
10 layer.0.SelfAttention.v
|
| 270 |
+
Quantizing ...
|
| 271 |
+
time 0.38
|
| 272 |
+
error 19551.57421875
|
| 273 |
+
10 layer.0.SelfAttention.o
|
| 274 |
+
Quantizing ...
|
| 275 |
+
time 0.35
|
| 276 |
+
error 159628.03125
|
| 277 |
+
10 layer.1.DenseReluDense.wi
|
| 278 |
+
Quantizing ...
|
| 279 |
+
time 0.35
|
| 280 |
+
error 5331122.5
|
| 281 |
+
10 layer.1.DenseReluDense.wo
|
| 282 |
+
Quantizing ...
|
| 283 |
+
time 1.41
|
| 284 |
+
error 987081.75
|
| 285 |
+
11 layer.0.SelfAttention.q
|
| 286 |
+
Quantizing ...
|
| 287 |
+
time 0.53
|
| 288 |
+
error 148.4892120361328
|
| 289 |
+
11 layer.0.SelfAttention.k
|
| 290 |
+
Quantizing ...
|
| 291 |
+
time 0.35
|
| 292 |
+
error 10070.583984375
|
| 293 |
+
11 layer.0.SelfAttention.v
|
| 294 |
+
Quantizing ...
|
| 295 |
+
time 0.35
|
| 296 |
+
error 22689.8046875
|
| 297 |
+
11 layer.0.SelfAttention.o
|
| 298 |
+
Quantizing ...
|
| 299 |
+
time 0.35
|
| 300 |
+
error 158388.921875
|
| 301 |
+
11 layer.1.DenseReluDense.wi
|
| 302 |
+
Quantizing ...
|
| 303 |
+
time 0.35
|
| 304 |
+
error 5614285.0
|
| 305 |
+
11 layer.1.DenseReluDense.wo
|
| 306 |
+
Quantizing ...
|
| 307 |
+
time 1.41
|
| 308 |
+
error 1036498.25
|
| 309 |
+
12 layer.0.SelfAttention.q
|
| 310 |
+
Quantizing ...
|
| 311 |
+
time 0.53
|
| 312 |
+
error 143.14183044433594
|
| 313 |
+
12 layer.0.SelfAttention.k
|
| 314 |
+
Quantizing ...
|
| 315 |
+
time 0.35
|
| 316 |
+
error 10775.267578125
|
| 317 |
+
12 layer.0.SelfAttention.v
|
| 318 |
+
Quantizing ...
|
| 319 |
+
time 0.37
|
| 320 |
+
error 30807.22265625
|
| 321 |
+
12 layer.0.SelfAttention.o
|
| 322 |
+
Quantizing ...
|
| 323 |
+
time 0.36
|
| 324 |
+
error 518529.21875
|
| 325 |
+
12 layer.1.DenseReluDense.wi
|
| 326 |
+
Quantizing ...
|
| 327 |
+
time 0.37
|
| 328 |
+
error 5196545.0
|
| 329 |
+
12 layer.1.DenseReluDense.wo
|
| 330 |
+
Quantizing ...
|
| 331 |
+
time 1.44
|
| 332 |
+
error 1605865.0
|
| 333 |
+
13 layer.0.SelfAttention.q
|
| 334 |
+
Quantizing ...
|
| 335 |
+
time 0.54
|
| 336 |
+
error 132.04205322265625
|
| 337 |
+
13 layer.0.SelfAttention.k
|
| 338 |
+
Quantizing ...
|
| 339 |
+
time 0.36
|
| 340 |
+
error 9211.498046875
|
| 341 |
+
13 layer.0.SelfAttention.v
|
| 342 |
+
Quantizing ...
|
| 343 |
+
time 0.37
|
| 344 |
+
error 32021.294921875
|
| 345 |
+
13 layer.0.SelfAttention.o
|
| 346 |
+
Quantizing ...
|
| 347 |
+
time 0.36
|
| 348 |
+
error 389801.46875
|
| 349 |
+
13 layer.1.DenseReluDense.wi
|
| 350 |
+
Quantizing ...
|
| 351 |
+
time 0.35
|
| 352 |
+
error 6028052.0
|
| 353 |
+
13 layer.1.DenseReluDense.wo
|
| 354 |
+
Quantizing ...
|
| 355 |
+
time 1.40
|
| 356 |
+
error 1947110.25
|
| 357 |
+
14 layer.0.SelfAttention.q
|
| 358 |
+
Quantizing ...
|
| 359 |
+
time 0.53
|
| 360 |
+
error 109.33882904052734
|
| 361 |
+
14 layer.0.SelfAttention.k
|
| 362 |
+
Quantizing ...
|
| 363 |
+
time 0.35
|
| 364 |
+
error 8652.20703125
|
| 365 |
+
14 layer.0.SelfAttention.v
|
| 366 |
+
Quantizing ...
|
| 367 |
+
time 0.35
|
| 368 |
+
error 29946.4140625
|
| 369 |
+
14 layer.0.SelfAttention.o
|
| 370 |
+
Quantizing ...
|
| 371 |
+
time 0.35
|
| 372 |
+
error 351310.0
|
| 373 |
+
14 layer.1.DenseReluDense.wi
|
| 374 |
+
Quantizing ...
|
| 375 |
+
time 0.35
|
| 376 |
+
error 6125760.5
|
| 377 |
+
14 layer.1.DenseReluDense.wo
|
| 378 |
+
Quantizing ...
|
| 379 |
+
time 1.41
|
| 380 |
+
error 2735209.0
|
| 381 |
+
15 layer.0.SelfAttention.q
|
| 382 |
+
Quantizing ...
|
| 383 |
+
time 0.53
|
| 384 |
+
error 113.90670776367188
|
| 385 |
+
15 layer.0.SelfAttention.k
|
| 386 |
+
Quantizing ...
|
| 387 |
+
time 0.35
|
| 388 |
+
error 8382.978515625
|
| 389 |
+
15 layer.0.SelfAttention.v
|
| 390 |
+
Quantizing ...
|
| 391 |
+
time 0.36
|
| 392 |
+
error 35500.65234375
|
| 393 |
+
15 layer.0.SelfAttention.o
|
| 394 |
+
Quantizing ...
|
| 395 |
+
time 0.35
|
| 396 |
+
error 520358.59375
|
| 397 |
+
15 layer.1.DenseReluDense.wi
|
| 398 |
+
Quantizing ...
|
| 399 |
+
time 0.38
|
| 400 |
+
error 6121543.5
|
| 401 |
+
15 layer.1.DenseReluDense.wo
|
| 402 |
+
Quantizing ...
|
| 403 |
+
time 1.43
|
| 404 |
+
error 3549418.5
|
| 405 |
+
16 layer.0.SelfAttention.q
|
| 406 |
+
Quantizing ...
|
| 407 |
+
time 0.53
|
| 408 |
+
error 106.98755645751953
|
| 409 |
+
16 layer.0.SelfAttention.k
|
| 410 |
+
Quantizing ...
|
| 411 |
+
time 0.37
|
| 412 |
+
error 7904.42333984375
|
| 413 |
+
16 layer.0.SelfAttention.v
|
| 414 |
+
Quantizing ...
|
| 415 |
+
time 0.35
|
| 416 |
+
error 40152.375
|
| 417 |
+
16 layer.0.SelfAttention.o
|
| 418 |
+
Quantizing ...
|
| 419 |
+
time 0.38
|
| 420 |
+
error 1242878.0
|
| 421 |
+
16 layer.1.DenseReluDense.wi
|
| 422 |
+
Quantizing ...
|
| 423 |
+
time 0.35
|
| 424 |
+
error 8400617.0
|
| 425 |
+
16 layer.1.DenseReluDense.wo
|
| 426 |
+
Quantizing ...
|
| 427 |
+
time 1.40
|
| 428 |
+
error 5480047.0
|
| 429 |
+
17 layer.0.SelfAttention.q
|
| 430 |
+
Quantizing ...
|
| 431 |
+
time 0.53
|
| 432 |
+
error 98.13764190673828
|
| 433 |
+
17 layer.0.SelfAttention.k
|
| 434 |
+
Quantizing ...
|
| 435 |
+
time 0.35
|
| 436 |
+
error 7841.5126953125
|
| 437 |
+
17 layer.0.SelfAttention.v
|
| 438 |
+
Quantizing ...
|
| 439 |
+
time 0.35
|
| 440 |
+
error 46148.609375
|
| 441 |
+
17 layer.0.SelfAttention.o
|
| 442 |
+
Quantizing ...
|
| 443 |
+
time 0.35
|
| 444 |
+
error 1168839.625
|
| 445 |
+
17 layer.1.DenseReluDense.wi
|
| 446 |
+
Quantizing ...
|
| 447 |
+
time 0.36
|
| 448 |
+
error 7634862.0
|
| 449 |
+
17 layer.1.DenseReluDense.wo
|
| 450 |
+
Quantizing ...
|
| 451 |
+
time 1.41
|
| 452 |
+
error 4989134.0
|
| 453 |
+
18 layer.0.SelfAttention.q
|
| 454 |
+
Quantizing ...
|
| 455 |
+
time 0.53
|
| 456 |
+
error 102.72500610351562
|
| 457 |
+
18 layer.0.SelfAttention.k
|
| 458 |
+
Quantizing ...
|
| 459 |
+
time 0.35
|
| 460 |
+
error 7599.8544921875
|
| 461 |
+
18 layer.0.SelfAttention.v
|
| 462 |
+
Quantizing ...
|
| 463 |
+
time 0.35
|
| 464 |
+
error 55332.08203125
|
| 465 |
+
18 layer.0.SelfAttention.o
|
| 466 |
+
Quantizing ...
|
| 467 |
+
time 0.36
|
| 468 |
+
error 3184639.0
|
| 469 |
+
18 layer.1.DenseReluDense.wi
|
| 470 |
+
Quantizing ...
|
| 471 |
+
time 0.36
|
| 472 |
+
error 6987084.5
|
| 473 |
+
18 layer.1.DenseReluDense.wo
|
| 474 |
+
Quantizing ...
|
| 475 |
+
time 1.45
|
| 476 |
+
error 7245906.0
|
| 477 |
+
19 layer.0.SelfAttention.q
|
| 478 |
+
Quantizing ...
|
| 479 |
+
time 0.53
|
| 480 |
+
error 81.86250305175781
|
| 481 |
+
19 layer.0.SelfAttention.k
|
| 482 |
+
Quantizing ...
|
| 483 |
+
time 0.37
|
| 484 |
+
error 5452.095703125
|
| 485 |
+
19 layer.0.SelfAttention.v
|
| 486 |
+
Quantizing ...
|
| 487 |
+
time 0.37
|
| 488 |
+
error 50052.5
|
| 489 |
+
19 layer.0.SelfAttention.o
|
| 490 |
+
Quantizing ...
|
| 491 |
+
time 0.38
|
| 492 |
+
error 2986069.25
|
| 493 |
+
19 layer.1.DenseReluDense.wi
|
| 494 |
+
Quantizing ...
|
| 495 |
+
time 0.35
|
| 496 |
+
error 9018568.0
|
| 497 |
+
19 layer.1.DenseReluDense.wo
|
| 498 |
+
Quantizing ...
|
| 499 |
+
time 1.41
|
| 500 |
+
error 10263636.0
|
| 501 |
+
20 layer.0.SelfAttention.q
|
| 502 |
+
Quantizing ...
|
| 503 |
+
time 0.53
|
| 504 |
+
error 76.51995086669922
|
| 505 |
+
20 layer.0.SelfAttention.k
|
| 506 |
+
Quantizing ...
|
| 507 |
+
time 0.35
|
| 508 |
+
error 5472.42333984375
|
| 509 |
+
20 layer.0.SelfAttention.v
|
| 510 |
+
Quantizing ...
|
| 511 |
+
time 0.35
|
| 512 |
+
error 41930.93359375
|
| 513 |
+
20 layer.0.SelfAttention.o
|
| 514 |
+
Quantizing ...
|
| 515 |
+
time 0.35
|
| 516 |
+
error 2892769.5
|
| 517 |
+
20 layer.1.DenseReluDense.wi
|
| 518 |
+
Quantizing ...
|
| 519 |
+
time 0.35
|
| 520 |
+
error 11466556.0
|
| 521 |
+
20 layer.1.DenseReluDense.wo
|
| 522 |
+
Quantizing ...
|
| 523 |
+
time 1.41
|
| 524 |
+
error 24789348.0
|
| 525 |
+
21 layer.0.SelfAttention.q
|
| 526 |
+
Quantizing ...
|
| 527 |
+
time 0.53
|
| 528 |
+
error 99.8782958984375
|
| 529 |
+
21 layer.0.SelfAttention.k
|
| 530 |
+
Quantizing ...
|
| 531 |
+
time 0.35
|
| 532 |
+
error 6085.8701171875
|
| 533 |
+
21 layer.0.SelfAttention.v
|
| 534 |
+
Quantizing ...
|
| 535 |
+
time 0.35
|
| 536 |
+
error 59590.58984375
|
| 537 |
+
21 layer.0.SelfAttention.o
|
| 538 |
+
Quantizing ...
|
| 539 |
+
time 0.38
|
| 540 |
+
error 4403669.0
|
| 541 |
+
21 layer.1.DenseReluDense.wi
|
| 542 |
+
Quantizing ...
|
| 543 |
+
time 0.37
|
| 544 |
+
error 18229172.0
|
| 545 |
+
21 layer.1.DenseReluDense.wo
|
| 546 |
+
Quantizing ...
|
| 547 |
+
time 1.49
|
| 548 |
+
error 16509261.0
|
| 549 |
+
22 layer.0.SelfAttention.q
|
| 550 |
+
Quantizing ...
|
| 551 |
+
time 0.53
|
| 552 |
+
error 92.60875701904297
|
| 553 |
+
22 layer.0.SelfAttention.k
|
| 554 |
+
Quantizing ...
|
| 555 |
+
time 0.37
|
| 556 |
+
error 7184.3828125
|
| 557 |
+
22 layer.0.SelfAttention.v
|
| 558 |
+
Quantizing ...
|
| 559 |
+
time 0.36
|
| 560 |
+
error 63427.015625
|
| 561 |
+
22 layer.0.SelfAttention.o
|
| 562 |
+
Quantizing ...
|
| 563 |
+
time 0.37
|
| 564 |
+
error 5621765.0
|
| 565 |
+
22 layer.1.DenseReluDense.wi
|
| 566 |
+
Quantizing ...
|
| 567 |
+
time 0.36
|
| 568 |
+
error 19273876.0
|
| 569 |
+
22 layer.1.DenseReluDense.wo
|
| 570 |
+
Quantizing ...
|
| 571 |
+
time 1.41
|
| 572 |
+
error 26262050.0
|
| 573 |
+
23 layer.0.SelfAttention.q
|
| 574 |
+
Quantizing ...
|
| 575 |
+
time 0.53
|
| 576 |
+
error 94.61616516113281
|
| 577 |
+
23 layer.0.SelfAttention.k
|
| 578 |
+
Quantizing ...
|
| 579 |
+
time 0.35
|
| 580 |
+
error 6629.54931640625
|
| 581 |
+
23 layer.0.SelfAttention.v
|
| 582 |
+
Quantizing ...
|
| 583 |
+
time 0.35
|
| 584 |
+
error 79510.0
|
| 585 |
+
23 layer.0.SelfAttention.o
|
| 586 |
+
Quantizing ...
|
| 587 |
+
time 0.35
|
| 588 |
+
error 9020421.0
|
| 589 |
+
23 layer.1.DenseReluDense.wi
|
| 590 |
+
Quantizing ...
|
| 591 |
+
time 0.35
|
| 592 |
+
error 11331573.0
|
| 593 |
+
23 layer.1.DenseReluDense.wo
|
| 594 |
+
Quantizing ...
|
| 595 |
+
time 1.40
|
| 596 |
+
error 37987768.0
|
| 597 |
+
138.1449637413025
|
| 598 |
+
Packing ...
|
| 599 |
+
encoder.block.0.layer.0.SelfAttention.q
|
| 600 |
+
encoder.block.0.layer.0.SelfAttention.k
|
| 601 |
+
encoder.block.0.layer.0.SelfAttention.v
|
| 602 |
+
encoder.block.0.layer.0.SelfAttention.o
|
| 603 |
+
encoder.block.0.layer.1.DenseReluDense.wi
|
| 604 |
+
encoder.block.0.layer.1.DenseReluDense.wo
|
| 605 |
+
encoder.block.1.layer.0.SelfAttention.q
|
| 606 |
+
encoder.block.1.layer.0.SelfAttention.k
|
| 607 |
+
encoder.block.1.layer.0.SelfAttention.v
|
| 608 |
+
encoder.block.1.layer.0.SelfAttention.o
|
| 609 |
+
encoder.block.1.layer.1.DenseReluDense.wi
|
| 610 |
+
encoder.block.1.layer.1.DenseReluDense.wo
|
| 611 |
+
encoder.block.2.layer.0.SelfAttention.q
|
| 612 |
+
encoder.block.2.layer.0.SelfAttention.k
|
| 613 |
+
encoder.block.2.layer.0.SelfAttention.v
|
| 614 |
+
encoder.block.2.layer.0.SelfAttention.o
|
| 615 |
+
encoder.block.2.layer.1.DenseReluDense.wi
|
| 616 |
+
encoder.block.2.layer.1.DenseReluDense.wo
|
| 617 |
+
encoder.block.3.layer.0.SelfAttention.q
|
| 618 |
+
encoder.block.3.layer.0.SelfAttention.k
|
| 619 |
+
encoder.block.3.layer.0.SelfAttention.v
|
| 620 |
+
encoder.block.3.layer.0.SelfAttention.o
|
| 621 |
+
encoder.block.3.layer.1.DenseReluDense.wi
|
| 622 |
+
encoder.block.3.layer.1.DenseReluDense.wo
|
| 623 |
+
encoder.block.4.layer.0.SelfAttention.q
|
| 624 |
+
encoder.block.4.layer.0.SelfAttention.k
|
| 625 |
+
encoder.block.4.layer.0.SelfAttention.v
|
| 626 |
+
encoder.block.4.layer.0.SelfAttention.o
|
| 627 |
+
encoder.block.4.layer.1.DenseReluDense.wi
|
| 628 |
+
encoder.block.4.layer.1.DenseReluDense.wo
|
| 629 |
+
encoder.block.5.layer.0.SelfAttention.q
|
| 630 |
+
encoder.block.5.layer.0.SelfAttention.k
|
| 631 |
+
encoder.block.5.layer.0.SelfAttention.v
|
| 632 |
+
encoder.block.5.layer.0.SelfAttention.o
|
| 633 |
+
encoder.block.5.layer.1.DenseReluDense.wi
|
| 634 |
+
encoder.block.5.layer.1.DenseReluDense.wo
|
| 635 |
+
encoder.block.6.layer.0.SelfAttention.q
|
| 636 |
+
encoder.block.6.layer.0.SelfAttention.k
|
| 637 |
+
encoder.block.6.layer.0.SelfAttention.v
|
| 638 |
+
encoder.block.6.layer.0.SelfAttention.o
|
| 639 |
+
encoder.block.6.layer.1.DenseReluDense.wi
|
| 640 |
+
encoder.block.6.layer.1.DenseReluDense.wo
|
| 641 |
+
encoder.block.7.layer.0.SelfAttention.q
|
| 642 |
+
encoder.block.7.layer.0.SelfAttention.k
|
| 643 |
+
encoder.block.7.layer.0.SelfAttention.v
|
| 644 |
+
encoder.block.7.layer.0.SelfAttention.o
|
| 645 |
+
encoder.block.7.layer.1.DenseReluDense.wi
|
| 646 |
+
encoder.block.7.layer.1.DenseReluDense.wo
|
| 647 |
+
encoder.block.8.layer.0.SelfAttention.q
|
| 648 |
+
encoder.block.8.layer.0.SelfAttention.k
|
| 649 |
+
encoder.block.8.layer.0.SelfAttention.v
|
| 650 |
+
encoder.block.8.layer.0.SelfAttention.o
|
| 651 |
+
encoder.block.8.layer.1.DenseReluDense.wi
|
| 652 |
+
encoder.block.8.layer.1.DenseReluDense.wo
|
| 653 |
+
encoder.block.9.layer.0.SelfAttention.q
|
| 654 |
+
encoder.block.9.layer.0.SelfAttention.k
|
| 655 |
+
encoder.block.9.layer.0.SelfAttention.v
|
| 656 |
+
encoder.block.9.layer.0.SelfAttention.o
|
| 657 |
+
encoder.block.9.layer.1.DenseReluDense.wi
|
| 658 |
+
encoder.block.9.layer.1.DenseReluDense.wo
|
| 659 |
+
encoder.block.10.layer.0.SelfAttention.q
|
| 660 |
+
encoder.block.10.layer.0.SelfAttention.k
|
| 661 |
+
encoder.block.10.layer.0.SelfAttention.v
|
| 662 |
+
encoder.block.10.layer.0.SelfAttention.o
|
| 663 |
+
encoder.block.10.layer.1.DenseReluDense.wi
|
| 664 |
+
encoder.block.10.layer.1.DenseReluDense.wo
|
| 665 |
+
encoder.block.11.layer.0.SelfAttention.q
|
| 666 |
+
encoder.block.11.layer.0.SelfAttention.k
|
| 667 |
+
encoder.block.11.layer.0.SelfAttention.v
|
| 668 |
+
encoder.block.11.layer.0.SelfAttention.o
|
| 669 |
+
encoder.block.11.layer.1.DenseReluDense.wi
|
| 670 |
+
encoder.block.11.layer.1.DenseReluDense.wo
|
| 671 |
+
encoder.block.12.layer.0.SelfAttention.q
|
| 672 |
+
encoder.block.12.layer.0.SelfAttention.k
|
| 673 |
+
encoder.block.12.layer.0.SelfAttention.v
|
| 674 |
+
encoder.block.12.layer.0.SelfAttention.o
|
| 675 |
+
encoder.block.12.layer.1.DenseReluDense.wi
|
| 676 |
+
encoder.block.12.layer.1.DenseReluDense.wo
|
| 677 |
+
encoder.block.13.layer.0.SelfAttention.q
|
| 678 |
+
encoder.block.13.layer.0.SelfAttention.k
|
| 679 |
+
encoder.block.13.layer.0.SelfAttention.v
|
| 680 |
+
encoder.block.13.layer.0.SelfAttention.o
|
| 681 |
+
encoder.block.13.layer.1.DenseReluDense.wi
|
| 682 |
+
encoder.block.13.layer.1.DenseReluDense.wo
|
| 683 |
+
encoder.block.14.layer.0.SelfAttention.q
|
| 684 |
+
encoder.block.14.layer.0.SelfAttention.k
|
| 685 |
+
encoder.block.14.layer.0.SelfAttention.v
|
| 686 |
+
encoder.block.14.layer.0.SelfAttention.o
|
| 687 |
+
encoder.block.14.layer.1.DenseReluDense.wi
|
| 688 |
+
encoder.block.14.layer.1.DenseReluDense.wo
|
| 689 |
+
encoder.block.15.layer.0.SelfAttention.q
|
| 690 |
+
encoder.block.15.layer.0.SelfAttention.k
|
| 691 |
+
encoder.block.15.layer.0.SelfAttention.v
|
| 692 |
+
encoder.block.15.layer.0.SelfAttention.o
|
| 693 |
+
encoder.block.15.layer.1.DenseReluDense.wi
|
| 694 |
+
encoder.block.15.layer.1.DenseReluDense.wo
|
| 695 |
+
encoder.block.16.layer.0.SelfAttention.q
|
| 696 |
+
encoder.block.16.layer.0.SelfAttention.k
|
| 697 |
+
encoder.block.16.layer.0.SelfAttention.v
|
| 698 |
+
encoder.block.16.layer.0.SelfAttention.o
|
| 699 |
+
encoder.block.16.layer.1.DenseReluDense.wi
|
| 700 |
+
encoder.block.16.layer.1.DenseReluDense.wo
|
| 701 |
+
encoder.block.17.layer.0.SelfAttention.q
|
| 702 |
+
encoder.block.17.layer.0.SelfAttention.k
|
| 703 |
+
encoder.block.17.layer.0.SelfAttention.v
|
| 704 |
+
encoder.block.17.layer.0.SelfAttention.o
|
| 705 |
+
encoder.block.17.layer.1.DenseReluDense.wi
|
| 706 |
+
encoder.block.17.layer.1.DenseReluDense.wo
|
| 707 |
+
encoder.block.18.layer.0.SelfAttention.q
|
| 708 |
+
encoder.block.18.layer.0.SelfAttention.k
|
| 709 |
+
encoder.block.18.layer.0.SelfAttention.v
|
| 710 |
+
encoder.block.18.layer.0.SelfAttention.o
|
| 711 |
+
encoder.block.18.layer.1.DenseReluDense.wi
|
| 712 |
+
encoder.block.18.layer.1.DenseReluDense.wo
|
| 713 |
+
encoder.block.19.layer.0.SelfAttention.q
|
| 714 |
+
encoder.block.19.layer.0.SelfAttention.k
|
| 715 |
+
encoder.block.19.layer.0.SelfAttention.v
|
| 716 |
+
encoder.block.19.layer.0.SelfAttention.o
|
| 717 |
+
encoder.block.19.layer.1.DenseReluDense.wi
|
| 718 |
+
encoder.block.19.layer.1.DenseReluDense.wo
|
| 719 |
+
encoder.block.20.layer.0.SelfAttention.q
|
| 720 |
+
encoder.block.20.layer.0.SelfAttention.k
|
| 721 |
+
encoder.block.20.layer.0.SelfAttention.v
|
| 722 |
+
encoder.block.20.layer.0.SelfAttention.o
|
| 723 |
+
encoder.block.20.layer.1.DenseReluDense.wi
|
| 724 |
+
encoder.block.20.layer.1.DenseReluDense.wo
|
| 725 |
+
encoder.block.21.layer.0.SelfAttention.q
|
| 726 |
+
encoder.block.21.layer.0.SelfAttention.k
|
| 727 |
+
encoder.block.21.layer.0.SelfAttention.v
|
| 728 |
+
encoder.block.21.layer.0.SelfAttention.o
|
| 729 |
+
encoder.block.21.layer.1.DenseReluDense.wi
|
| 730 |
+
encoder.block.21.layer.1.DenseReluDense.wo
|
| 731 |
+
encoder.block.22.layer.0.SelfAttention.q
|
| 732 |
+
encoder.block.22.layer.0.SelfAttention.k
|
| 733 |
+
encoder.block.22.layer.0.SelfAttention.v
|
| 734 |
+
encoder.block.22.layer.0.SelfAttention.o
|
| 735 |
+
encoder.block.22.layer.1.DenseReluDense.wi
|
| 736 |
+
encoder.block.22.layer.1.DenseReluDense.wo
|
| 737 |
+
encoder.block.23.layer.0.SelfAttention.q
|
| 738 |
+
encoder.block.23.layer.0.SelfAttention.k
|
| 739 |
+
encoder.block.23.layer.0.SelfAttention.v
|
| 740 |
+
encoder.block.23.layer.0.SelfAttention.o
|
| 741 |
+
encoder.block.23.layer.1.DenseReluDense.wi
|
| 742 |
+
encoder.block.23.layer.1.DenseReluDense.wo
|
| 743 |
+
Done.
|
workspace/ts_xxl_record.txt
ADDED
|
@@ -0,0 +1,853 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CUDA extension not installed.
|
| 2 |
+
Some weights of the model checkpoint at google/t5-v1_1-xxl were not used when initializing T5EncoderModel: ['decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.embed_tokens.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'lm_head.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.final_layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight']
|
| 3 |
+
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
| 4 |
+
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
| 5 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 6 |
+
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
|
| 7 |
+
Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
|
| 8 |
+
Starting ...
|
| 9 |
+
Ready.
|
| 10 |
+
0 layer.0.SelfAttention.q
|
| 11 |
+
Quantizing ...
|
| 12 |
+
time 2.80
|
| 13 |
+
error 137.22543334960938
|
| 14 |
+
0 layer.0.SelfAttention.k
|
| 15 |
+
Quantizing ...
|
| 16 |
+
time 1.03
|
| 17 |
+
error 11656.236328125
|
| 18 |
+
0 layer.0.SelfAttention.v
|
| 19 |
+
Quantizing ...
|
| 20 |
+
time 1.04
|
| 21 |
+
error 10592.220703125
|
| 22 |
+
0 layer.0.SelfAttention.o
|
| 23 |
+
Quantizing ...
|
| 24 |
+
time 1.03
|
| 25 |
+
error 120966.59375
|
| 26 |
+
0 layer.1.DenseReluDense.wi_0
|
| 27 |
+
Quantizing ...
|
| 28 |
+
time 1.05
|
| 29 |
+
error 38126.375
|
| 30 |
+
0 layer.1.DenseReluDense.wi_1
|
| 31 |
+
Quantizing ...
|
| 32 |
+
time 1.04
|
| 33 |
+
error 32506.427734375
|
| 34 |
+
0 layer.1.DenseReluDense.wo
|
| 35 |
+
Quantizing ...
|
| 36 |
+
time 2.81
|
| 37 |
+
error 214925.140625
|
| 38 |
+
1 layer.0.SelfAttention.q
|
| 39 |
+
Quantizing ...
|
| 40 |
+
time 2.27
|
| 41 |
+
error 253.24050903320312
|
| 42 |
+
1 layer.0.SelfAttention.k
|
| 43 |
+
Quantizing ...
|
| 44 |
+
time 1.01
|
| 45 |
+
error 15095.802734375
|
| 46 |
+
1 layer.0.SelfAttention.v
|
| 47 |
+
Quantizing ...
|
| 48 |
+
time 1.03
|
| 49 |
+
error 4179.1083984375
|
| 50 |
+
1 layer.0.SelfAttention.o
|
| 51 |
+
Quantizing ...
|
| 52 |
+
time 1.03
|
| 53 |
+
error 20773.45703125
|
| 54 |
+
1 layer.1.DenseReluDense.wi_0
|
| 55 |
+
Quantizing ...
|
| 56 |
+
time 1.03
|
| 57 |
+
error 28934.0859375
|
| 58 |
+
1 layer.1.DenseReluDense.wi_1
|
| 59 |
+
Quantizing ...
|
| 60 |
+
time 1.05
|
| 61 |
+
error 24144.3125
|
| 62 |
+
1 layer.1.DenseReluDense.wo
|
| 63 |
+
Quantizing ...
|
| 64 |
+
time 2.75
|
| 65 |
+
error 97274.90625
|
| 66 |
+
2 layer.0.SelfAttention.q
|
| 67 |
+
Quantizing ...
|
| 68 |
+
time 2.34
|
| 69 |
+
error 205.71896362304688
|
| 70 |
+
2 layer.0.SelfAttention.k
|
| 71 |
+
Quantizing ...
|
| 72 |
+
time 1.05
|
| 73 |
+
error 10929.7021484375
|
| 74 |
+
2 layer.0.SelfAttention.v
|
| 75 |
+
Quantizing ...
|
| 76 |
+
time 1.06
|
| 77 |
+
error 3825.074462890625
|
| 78 |
+
2 layer.0.SelfAttention.o
|
| 79 |
+
Quantizing ...
|
| 80 |
+
time 1.02
|
| 81 |
+
error 2498.05859375
|
| 82 |
+
2 layer.1.DenseReluDense.wi_0
|
| 83 |
+
Quantizing ...
|
| 84 |
+
time 1.03
|
| 85 |
+
error 42947.859375
|
| 86 |
+
2 layer.1.DenseReluDense.wi_1
|
| 87 |
+
Quantizing ...
|
| 88 |
+
time 1.03
|
| 89 |
+
error 36752.1171875
|
| 90 |
+
2 layer.1.DenseReluDense.wo
|
| 91 |
+
Quantizing ...
|
| 92 |
+
time 2.71
|
| 93 |
+
error 135178.4375
|
| 94 |
+
3 layer.0.SelfAttention.q
|
| 95 |
+
Quantizing ...
|
| 96 |
+
time 2.31
|
| 97 |
+
error 263.6244201660156
|
| 98 |
+
3 layer.0.SelfAttention.k
|
| 99 |
+
Quantizing ...
|
| 100 |
+
time 1.06
|
| 101 |
+
error 13956.330078125
|
| 102 |
+
3 layer.0.SelfAttention.v
|
| 103 |
+
Quantizing ...
|
| 104 |
+
time 1.06
|
| 105 |
+
error 5999.3544921875
|
| 106 |
+
3 layer.0.SelfAttention.o
|
| 107 |
+
Quantizing ...
|
| 108 |
+
time 1.05
|
| 109 |
+
error 5389.494140625
|
| 110 |
+
3 layer.1.DenseReluDense.wi_0
|
| 111 |
+
Quantizing ...
|
| 112 |
+
time 1.10
|
| 113 |
+
error 43406.984375
|
| 114 |
+
3 layer.1.DenseReluDense.wi_1
|
| 115 |
+
Quantizing ...
|
| 116 |
+
time 1.07
|
| 117 |
+
error 40294.578125
|
| 118 |
+
3 layer.1.DenseReluDense.wo
|
| 119 |
+
Quantizing ...
|
| 120 |
+
time 2.80
|
| 121 |
+
error 136006.0
|
| 122 |
+
4 layer.0.SelfAttention.q
|
| 123 |
+
Quantizing ...
|
| 124 |
+
time 2.30
|
| 125 |
+
error 300.17022705078125
|
| 126 |
+
4 layer.0.SelfAttention.k
|
| 127 |
+
Quantizing ...
|
| 128 |
+
time 1.03
|
| 129 |
+
error 16043.65234375
|
| 130 |
+
4 layer.0.SelfAttention.v
|
| 131 |
+
Quantizing ...
|
| 132 |
+
time 1.03
|
| 133 |
+
error 6112.3857421875
|
| 134 |
+
4 layer.0.SelfAttention.o
|
| 135 |
+
Quantizing ...
|
| 136 |
+
time 1.03
|
| 137 |
+
error 4162.61474609375
|
| 138 |
+
4 layer.1.DenseReluDense.wi_0
|
| 139 |
+
Quantizing ...
|
| 140 |
+
time 1.06
|
| 141 |
+
error 44532.5625
|
| 142 |
+
4 layer.1.DenseReluDense.wi_1
|
| 143 |
+
Quantizing ...
|
| 144 |
+
time 1.07
|
| 145 |
+
error 42825.140625
|
| 146 |
+
4 layer.1.DenseReluDense.wo
|
| 147 |
+
Quantizing ...
|
| 148 |
+
time 2.88
|
| 149 |
+
error 165037.09375
|
| 150 |
+
5 layer.0.SelfAttention.q
|
| 151 |
+
Quantizing ...
|
| 152 |
+
time 2.28
|
| 153 |
+
error 352.9566650390625
|
| 154 |
+
5 layer.0.SelfAttention.k
|
| 155 |
+
Quantizing ...
|
| 156 |
+
time 1.03
|
| 157 |
+
error 19099.544921875
|
| 158 |
+
5 layer.0.SelfAttention.v
|
| 159 |
+
Quantizing ...
|
| 160 |
+
time 1.02
|
| 161 |
+
error 6900.2197265625
|
| 162 |
+
5 layer.0.SelfAttention.o
|
| 163 |
+
Quantizing ...
|
| 164 |
+
time 1.03
|
| 165 |
+
error 14074.9541015625
|
| 166 |
+
5 layer.1.DenseReluDense.wi_0
|
| 167 |
+
Quantizing ...
|
| 168 |
+
time 1.05
|
| 169 |
+
error 38257.37109375
|
| 170 |
+
5 layer.1.DenseReluDense.wi_1
|
| 171 |
+
Quantizing ...
|
| 172 |
+
time 1.04
|
| 173 |
+
error 36839.3046875
|
| 174 |
+
5 layer.1.DenseReluDense.wo
|
| 175 |
+
Quantizing ...
|
| 176 |
+
time 2.76
|
| 177 |
+
error 132062.96875
|
| 178 |
+
6 layer.0.SelfAttention.q
|
| 179 |
+
Quantizing ...
|
| 180 |
+
time 2.33
|
| 181 |
+
error 385.77520751953125
|
| 182 |
+
6 layer.0.SelfAttention.k
|
| 183 |
+
Quantizing ...
|
| 184 |
+
time 1.06
|
| 185 |
+
error 22221.486328125
|
| 186 |
+
6 layer.0.SelfAttention.v
|
| 187 |
+
Quantizing ...
|
| 188 |
+
time 1.02
|
| 189 |
+
error 7855.71533203125
|
| 190 |
+
6 layer.0.SelfAttention.o
|
| 191 |
+
Quantizing ...
|
| 192 |
+
time 1.04
|
| 193 |
+
error 20587.6171875
|
| 194 |
+
6 layer.1.DenseReluDense.wi_0
|
| 195 |
+
Quantizing ...
|
| 196 |
+
time 1.05
|
| 197 |
+
error 34824.55078125
|
| 198 |
+
6 layer.1.DenseReluDense.wi_1
|
| 199 |
+
Quantizing ...
|
| 200 |
+
time 1.05
|
| 201 |
+
error 36079.15625
|
| 202 |
+
6 layer.1.DenseReluDense.wo
|
| 203 |
+
Quantizing ...
|
| 204 |
+
time 2.74
|
| 205 |
+
error 166183.125
|
| 206 |
+
7 layer.0.SelfAttention.q
|
| 207 |
+
Quantizing ...
|
| 208 |
+
time 2.32
|
| 209 |
+
error 304.88519287109375
|
| 210 |
+
7 layer.0.SelfAttention.k
|
| 211 |
+
Quantizing ...
|
| 212 |
+
time 1.05
|
| 213 |
+
error 21111.80859375
|
| 214 |
+
7 layer.0.SelfAttention.v
|
| 215 |
+
Quantizing ...
|
| 216 |
+
time 1.05
|
| 217 |
+
error 5978.3095703125
|
| 218 |
+
7 layer.0.SelfAttention.o
|
| 219 |
+
Quantizing ...
|
| 220 |
+
time 1.08
|
| 221 |
+
error 10927.888671875
|
| 222 |
+
7 layer.1.DenseReluDense.wi_0
|
| 223 |
+
Quantizing ...
|
| 224 |
+
time 1.07
|
| 225 |
+
error 29760.138671875
|
| 226 |
+
7 layer.1.DenseReluDense.wi_1
|
| 227 |
+
Quantizing ...
|
| 228 |
+
time 1.08
|
| 229 |
+
error 33814.875
|
| 230 |
+
7 layer.1.DenseReluDense.wo
|
| 231 |
+
Quantizing ...
|
| 232 |
+
time 2.73
|
| 233 |
+
error 175563.4375
|
| 234 |
+
8 layer.0.SelfAttention.q
|
| 235 |
+
Quantizing ...
|
| 236 |
+
time 2.30
|
| 237 |
+
error 333.85931396484375
|
| 238 |
+
8 layer.0.SelfAttention.k
|
| 239 |
+
Quantizing ...
|
| 240 |
+
time 1.03
|
| 241 |
+
error 24634.984375
|
| 242 |
+
8 layer.0.SelfAttention.v
|
| 243 |
+
Quantizing ...
|
| 244 |
+
time 1.03
|
| 245 |
+
error 7116.8212890625
|
| 246 |
+
8 layer.0.SelfAttention.o
|
| 247 |
+
Quantizing ...
|
| 248 |
+
time 1.07
|
| 249 |
+
error 15384.3369140625
|
| 250 |
+
8 layer.1.DenseReluDense.wi_0
|
| 251 |
+
Quantizing ...
|
| 252 |
+
time 1.07
|
| 253 |
+
error 28838.537109375
|
| 254 |
+
8 layer.1.DenseReluDense.wi_1
|
| 255 |
+
Quantizing ...
|
| 256 |
+
time 1.09
|
| 257 |
+
error 29991.21875
|
| 258 |
+
8 layer.1.DenseReluDense.wo
|
| 259 |
+
Quantizing ...
|
| 260 |
+
time 2.85
|
| 261 |
+
error 170053.9375
|
| 262 |
+
9 layer.0.SelfAttention.q
|
| 263 |
+
Quantizing ...
|
| 264 |
+
time 2.27
|
| 265 |
+
error 354.49725341796875
|
| 266 |
+
9 layer.0.SelfAttention.k
|
| 267 |
+
Quantizing ...
|
| 268 |
+
time 1.02
|
| 269 |
+
error 26472.80078125
|
| 270 |
+
9 layer.0.SelfAttention.v
|
| 271 |
+
Quantizing ...
|
| 272 |
+
time 1.02
|
| 273 |
+
error 9778.65234375
|
| 274 |
+
9 layer.0.SelfAttention.o
|
| 275 |
+
Quantizing ...
|
| 276 |
+
time 1.03
|
| 277 |
+
error 46135.9140625
|
| 278 |
+
9 layer.1.DenseReluDense.wi_0
|
| 279 |
+
Quantizing ...
|
| 280 |
+
time 1.05
|
| 281 |
+
error 30183.34765625
|
| 282 |
+
9 layer.1.DenseReluDense.wi_1
|
| 283 |
+
Quantizing ...
|
| 284 |
+
time 1.05
|
| 285 |
+
error 35315.9375
|
| 286 |
+
9 layer.1.DenseReluDense.wo
|
| 287 |
+
Quantizing ...
|
| 288 |
+
time 2.80
|
| 289 |
+
error 294261.34375
|
| 290 |
+
10 layer.0.SelfAttention.q
|
| 291 |
+
Quantizing ...
|
| 292 |
+
time 2.36
|
| 293 |
+
error 330.4294128417969
|
| 294 |
+
10 layer.0.SelfAttention.k
|
| 295 |
+
Quantizing ...
|
| 296 |
+
time 1.04
|
| 297 |
+
error 21810.806640625
|
| 298 |
+
10 layer.0.SelfAttention.v
|
| 299 |
+
Quantizing ...
|
| 300 |
+
time 1.03
|
| 301 |
+
error 7377.060546875
|
| 302 |
+
10 layer.0.SelfAttention.o
|
| 303 |
+
Quantizing ...
|
| 304 |
+
time 1.03
|
| 305 |
+
error 31458.453125
|
| 306 |
+
10 layer.1.DenseReluDense.wi_0
|
| 307 |
+
Quantizing ...
|
| 308 |
+
time 1.05
|
| 309 |
+
error 30981.423828125
|
| 310 |
+
10 layer.1.DenseReluDense.wi_1
|
| 311 |
+
Quantizing ...
|
| 312 |
+
time 1.05
|
| 313 |
+
error 45770.9140625
|
| 314 |
+
10 layer.1.DenseReluDense.wo
|
| 315 |
+
Quantizing ...
|
| 316 |
+
time 2.73
|
| 317 |
+
error 338105.5625
|
| 318 |
+
11 layer.0.SelfAttention.q
|
| 319 |
+
Quantizing ...
|
| 320 |
+
time 2.35
|
| 321 |
+
error 332.6951904296875
|
| 322 |
+
11 layer.0.SelfAttention.k
|
| 323 |
+
Quantizing ...
|
| 324 |
+
time 1.06
|
| 325 |
+
error 23045.384765625
|
| 326 |
+
11 layer.0.SelfAttention.v
|
| 327 |
+
Quantizing ...
|
| 328 |
+
time 1.07
|
| 329 |
+
error 9068.484375
|
| 330 |
+
11 layer.0.SelfAttention.o
|
| 331 |
+
Quantizing ...
|
| 332 |
+
time 1.09
|
| 333 |
+
error 39716.03125
|
| 334 |
+
11 layer.1.DenseReluDense.wi_0
|
| 335 |
+
Quantizing ...
|
| 336 |
+
time 1.05
|
| 337 |
+
error 29951.611328125
|
| 338 |
+
11 layer.1.DenseReluDense.wi_1
|
| 339 |
+
Quantizing ...
|
| 340 |
+
time 1.06
|
| 341 |
+
error 46667.8828125
|
| 342 |
+
11 layer.1.DenseReluDense.wo
|
| 343 |
+
Quantizing ...
|
| 344 |
+
time 2.76
|
| 345 |
+
error 458927.0
|
| 346 |
+
12 layer.0.SelfAttention.q
|
| 347 |
+
Quantizing ...
|
| 348 |
+
time 2.29
|
| 349 |
+
error 364.91387939453125
|
| 350 |
+
12 layer.0.SelfAttention.k
|
| 351 |
+
Quantizing ...
|
| 352 |
+
time 1.03
|
| 353 |
+
error 26386.5546875
|
| 354 |
+
12 layer.0.SelfAttention.v
|
| 355 |
+
Quantizing ...
|
| 356 |
+
time 1.08
|
| 357 |
+
error 10412.025390625
|
| 358 |
+
12 layer.0.SelfAttention.o
|
| 359 |
+
Quantizing ...
|
| 360 |
+
time 1.07
|
| 361 |
+
error 69506.734375
|
| 362 |
+
12 layer.1.DenseReluDense.wi_0
|
| 363 |
+
Quantizing ...
|
| 364 |
+
time 1.08
|
| 365 |
+
error 32437.169921875
|
| 366 |
+
12 layer.1.DenseReluDense.wi_1
|
| 367 |
+
Quantizing ...
|
| 368 |
+
time 1.13
|
| 369 |
+
error 54537.1328125
|
| 370 |
+
12 layer.1.DenseReluDense.wo
|
| 371 |
+
Quantizing ...
|
| 372 |
+
time 2.81
|
| 373 |
+
error 555848.125
|
| 374 |
+
13 layer.0.SelfAttention.q
|
| 375 |
+
Quantizing ...
|
| 376 |
+
time 2.28
|
| 377 |
+
error 334.4095153808594
|
| 378 |
+
13 layer.0.SelfAttention.k
|
| 379 |
+
Quantizing ...
|
| 380 |
+
time 1.04
|
| 381 |
+
error 24624.59375
|
| 382 |
+
13 layer.0.SelfAttention.v
|
| 383 |
+
Quantizing ...
|
| 384 |
+
time 1.04
|
| 385 |
+
error 11093.2373046875
|
| 386 |
+
13 layer.0.SelfAttention.o
|
| 387 |
+
Quantizing ...
|
| 388 |
+
time 1.02
|
| 389 |
+
error 73139.5859375
|
| 390 |
+
13 layer.1.DenseReluDense.wi_0
|
| 391 |
+
Quantizing ...
|
| 392 |
+
time 1.06
|
| 393 |
+
error 31185.44921875
|
| 394 |
+
13 layer.1.DenseReluDense.wi_1
|
| 395 |
+
Quantizing ...
|
| 396 |
+
time 1.08
|
| 397 |
+
error 63193.28125
|
| 398 |
+
13 layer.1.DenseReluDense.wo
|
| 399 |
+
Quantizing ...
|
| 400 |
+
time 2.84
|
| 401 |
+
error 484003.5
|
| 402 |
+
14 layer.0.SelfAttention.q
|
| 403 |
+
Quantizing ...
|
| 404 |
+
time 2.33
|
| 405 |
+
error 315.36883544921875
|
| 406 |
+
14 layer.0.SelfAttention.k
|
| 407 |
+
Quantizing ...
|
| 408 |
+
time 1.02
|
| 409 |
+
error 22693.66015625
|
| 410 |
+
14 layer.0.SelfAttention.v
|
| 411 |
+
Quantizing ...
|
| 412 |
+
time 1.04
|
| 413 |
+
error 11054.283203125
|
| 414 |
+
14 layer.0.SelfAttention.o
|
| 415 |
+
Quantizing ...
|
| 416 |
+
time 1.04
|
| 417 |
+
error 55301.96875
|
| 418 |
+
14 layer.1.DenseReluDense.wi_0
|
| 419 |
+
Quantizing ...
|
| 420 |
+
time 1.06
|
| 421 |
+
error 35040.09765625
|
| 422 |
+
14 layer.1.DenseReluDense.wi_1
|
| 423 |
+
Quantizing ...
|
| 424 |
+
time 1.04
|
| 425 |
+
error 69227.671875
|
| 426 |
+
14 layer.1.DenseReluDense.wo
|
| 427 |
+
Quantizing ...
|
| 428 |
+
time 2.76
|
| 429 |
+
error 538346.875
|
| 430 |
+
15 layer.0.SelfAttention.q
|
| 431 |
+
Quantizing ...
|
| 432 |
+
time 2.31
|
| 433 |
+
error 305.54083251953125
|
| 434 |
+
15 layer.0.SelfAttention.k
|
| 435 |
+
Quantizing ...
|
| 436 |
+
time 1.05
|
| 437 |
+
error 22575.48046875
|
| 438 |
+
15 layer.0.SelfAttention.v
|
| 439 |
+
Quantizing ...
|
| 440 |
+
time 1.10
|
| 441 |
+
error 14035.61328125
|
| 442 |
+
15 layer.0.SelfAttention.o
|
| 443 |
+
Quantizing ...
|
| 444 |
+
time 1.03
|
| 445 |
+
error 100519.5234375
|
| 446 |
+
15 layer.1.DenseReluDense.wi_0
|
| 447 |
+
Quantizing ...
|
| 448 |
+
time 1.04
|
| 449 |
+
error 34874.54296875
|
| 450 |
+
15 layer.1.DenseReluDense.wi_1
|
| 451 |
+
Quantizing ...
|
| 452 |
+
time 1.04
|
| 453 |
+
error 76981.28125
|
| 454 |
+
15 layer.1.DenseReluDense.wo
|
| 455 |
+
Quantizing ...
|
| 456 |
+
time 2.75
|
| 457 |
+
error 590792.75
|
| 458 |
+
16 layer.0.SelfAttention.q
|
| 459 |
+
Quantizing ...
|
| 460 |
+
time 2.30
|
| 461 |
+
error 292.1910095214844
|
| 462 |
+
16 layer.0.SelfAttention.k
|
| 463 |
+
Quantizing ...
|
| 464 |
+
time 1.10
|
| 465 |
+
error 24363.197265625
|
| 466 |
+
16 layer.0.SelfAttention.v
|
| 467 |
+
Quantizing ...
|
| 468 |
+
time 1.08
|
| 469 |
+
error 17756.51953125
|
| 470 |
+
16 layer.0.SelfAttention.o
|
| 471 |
+
Quantizing ...
|
| 472 |
+
time 1.09
|
| 473 |
+
error 189057.78125
|
| 474 |
+
16 layer.1.DenseReluDense.wi_0
|
| 475 |
+
Quantizing ...
|
| 476 |
+
time 1.07
|
| 477 |
+
error 35124.7109375
|
| 478 |
+
16 layer.1.DenseReluDense.wi_1
|
| 479 |
+
Quantizing ...
|
| 480 |
+
time 1.09
|
| 481 |
+
error 87091.78125
|
| 482 |
+
16 layer.1.DenseReluDense.wo
|
| 483 |
+
Quantizing ...
|
| 484 |
+
time 2.81
|
| 485 |
+
error 1044289.5625
|
| 486 |
+
17 layer.0.SelfAttention.q
|
| 487 |
+
Quantizing ...
|
| 488 |
+
time 2.28
|
| 489 |
+
error 261.1668701171875
|
| 490 |
+
17 layer.0.SelfAttention.k
|
| 491 |
+
Quantizing ...
|
| 492 |
+
time 1.02
|
| 493 |
+
error 18598.86328125
|
| 494 |
+
17 layer.0.SelfAttention.v
|
| 495 |
+
Quantizing ...
|
| 496 |
+
time 1.03
|
| 497 |
+
error 18718.98046875
|
| 498 |
+
17 layer.0.SelfAttention.o
|
| 499 |
+
Quantizing ...
|
| 500 |
+
time 1.04
|
| 501 |
+
error 254419.0625
|
| 502 |
+
17 layer.1.DenseReluDense.wi_0
|
| 503 |
+
Quantizing ...
|
| 504 |
+
time 1.07
|
| 505 |
+
error 35458.671875
|
| 506 |
+
17 layer.1.DenseReluDense.wi_1
|
| 507 |
+
Quantizing ...
|
| 508 |
+
time 1.10
|
| 509 |
+
error 88659.0390625
|
| 510 |
+
17 layer.1.DenseReluDense.wo
|
| 511 |
+
Quantizing ...
|
| 512 |
+
time 2.87
|
| 513 |
+
error 1568064.75
|
| 514 |
+
18 layer.0.SelfAttention.q
|
| 515 |
+
Quantizing ...
|
| 516 |
+
time 2.31
|
| 517 |
+
error 282.4662780761719
|
| 518 |
+
18 layer.0.SelfAttention.k
|
| 519 |
+
Quantizing ...
|
| 520 |
+
time 1.03
|
| 521 |
+
error 19631.552734375
|
| 522 |
+
18 layer.0.SelfAttention.v
|
| 523 |
+
Quantizing ...
|
| 524 |
+
time 1.06
|
| 525 |
+
error 21855.74609375
|
| 526 |
+
18 layer.0.SelfAttention.o
|
| 527 |
+
Quantizing ...
|
| 528 |
+
time 1.05
|
| 529 |
+
error 451241.28125
|
| 530 |
+
18 layer.1.DenseReluDense.wi_0
|
| 531 |
+
Quantizing ...
|
| 532 |
+
time 1.04
|
| 533 |
+
error 35819.91015625
|
| 534 |
+
18 layer.1.DenseReluDense.wi_1
|
| 535 |
+
Quantizing ...
|
| 536 |
+
time 1.04
|
| 537 |
+
error 96373.1015625
|
| 538 |
+
18 layer.1.DenseReluDense.wo
|
| 539 |
+
Quantizing ...
|
| 540 |
+
time 2.75
|
| 541 |
+
error 4121681.25
|
| 542 |
+
19 layer.0.SelfAttention.q
|
| 543 |
+
Quantizing ...
|
| 544 |
+
time 2.33
|
| 545 |
+
error 222.93960571289062
|
| 546 |
+
19 layer.0.SelfAttention.k
|
| 547 |
+
Quantizing ...
|
| 548 |
+
time 1.08
|
| 549 |
+
error 15299.37890625
|
| 550 |
+
19 layer.0.SelfAttention.v
|
| 551 |
+
Quantizing ...
|
| 552 |
+
time 1.04
|
| 553 |
+
error 25438.86328125
|
| 554 |
+
19 layer.0.SelfAttention.o
|
| 555 |
+
Quantizing ...
|
| 556 |
+
time 1.05
|
| 557 |
+
error 1097173.0
|
| 558 |
+
19 layer.1.DenseReluDense.wi_0
|
| 559 |
+
Quantizing ...
|
| 560 |
+
time 1.06
|
| 561 |
+
error 34149.09375
|
| 562 |
+
19 layer.1.DenseReluDense.wi_1
|
| 563 |
+
Quantizing ...
|
| 564 |
+
time 1.04
|
| 565 |
+
error 90188.0078125
|
| 566 |
+
19 layer.1.DenseReluDense.wo
|
| 567 |
+
Quantizing ...
|
| 568 |
+
time 2.74
|
| 569 |
+
error 6266101.0
|
| 570 |
+
20 layer.0.SelfAttention.q
|
| 571 |
+
Quantizing ...
|
| 572 |
+
time 2.35
|
| 573 |
+
error 211.04458618164062
|
| 574 |
+
20 layer.0.SelfAttention.k
|
| 575 |
+
Quantizing ...
|
| 576 |
+
time 1.04
|
| 577 |
+
error 13809.572265625
|
| 578 |
+
20 layer.0.SelfAttention.v
|
| 579 |
+
Quantizing ...
|
| 580 |
+
time 1.06
|
| 581 |
+
error 29788.564453125
|
| 582 |
+
20 layer.0.SelfAttention.o
|
| 583 |
+
Quantizing ...
|
| 584 |
+
time 1.05
|
| 585 |
+
error 1334543.125
|
| 586 |
+
20 layer.1.DenseReluDense.wi_0
|
| 587 |
+
Quantizing ...
|
| 588 |
+
time 1.09
|
| 589 |
+
error 31375.771484375
|
| 590 |
+
20 layer.1.DenseReluDense.wi_1
|
| 591 |
+
Quantizing ...
|
| 592 |
+
time 1.08
|
| 593 |
+
error 78350.203125
|
| 594 |
+
20 layer.1.DenseReluDense.wo
|
| 595 |
+
Quantizing ...
|
| 596 |
+
time 2.74
|
| 597 |
+
error 7183110.0
|
| 598 |
+
21 layer.0.SelfAttention.q
|
| 599 |
+
Quantizing ...
|
| 600 |
+
time 2.30
|
| 601 |
+
error 194.26229858398438
|
| 602 |
+
21 layer.0.SelfAttention.k
|
| 603 |
+
Quantizing ...
|
| 604 |
+
time 1.04
|
| 605 |
+
error 14619.9853515625
|
| 606 |
+
21 layer.0.SelfAttention.v
|
| 607 |
+
Quantizing ...
|
| 608 |
+
time 1.04
|
| 609 |
+
error 38181.265625
|
| 610 |
+
21 layer.0.SelfAttention.o
|
| 611 |
+
Quantizing ...
|
| 612 |
+
time 1.05
|
| 613 |
+
error 1776184.0
|
| 614 |
+
21 layer.1.DenseReluDense.wi_0
|
| 615 |
+
Quantizing ...
|
| 616 |
+
time 1.12
|
| 617 |
+
error 30981.5625
|
| 618 |
+
21 layer.1.DenseReluDense.wi_1
|
| 619 |
+
Quantizing ...
|
| 620 |
+
time 1.09
|
| 621 |
+
error 77552.046875
|
| 622 |
+
21 layer.1.DenseReluDense.wo
|
| 623 |
+
Quantizing ...
|
| 624 |
+
time 2.83
|
| 625 |
+
error 9851391.0
|
| 626 |
+
22 layer.0.SelfAttention.q
|
| 627 |
+
Quantizing ...
|
| 628 |
+
time 2.29
|
| 629 |
+
error 196.11984252929688
|
| 630 |
+
22 layer.0.SelfAttention.k
|
| 631 |
+
Quantizing ...
|
| 632 |
+
time 1.03
|
| 633 |
+
error 12573.25
|
| 634 |
+
22 layer.0.SelfAttention.v
|
| 635 |
+
Quantizing ...
|
| 636 |
+
time 1.04
|
| 637 |
+
error 43983.0703125
|
| 638 |
+
22 layer.0.SelfAttention.o
|
| 639 |
+
Quantizing ...
|
| 640 |
+
time 1.03
|
| 641 |
+
error 1969925.5
|
| 642 |
+
22 layer.1.DenseReluDense.wi_0
|
| 643 |
+
Quantizing ...
|
| 644 |
+
time 1.05
|
| 645 |
+
error 42481.56640625
|
| 646 |
+
22 layer.1.DenseReluDense.wi_1
|
| 647 |
+
Quantizing ...
|
| 648 |
+
time 1.04
|
| 649 |
+
error 106760.0078125
|
| 650 |
+
22 layer.1.DenseReluDense.wo
|
| 651 |
+
Quantizing ...
|
| 652 |
+
time 2.84
|
| 653 |
+
error 15271906.0
|
| 654 |
+
23 layer.0.SelfAttention.q
|
| 655 |
+
Quantizing ...
|
| 656 |
+
time 2.39
|
| 657 |
+
error 213.98135375976562
|
| 658 |
+
23 layer.0.SelfAttention.k
|
| 659 |
+
Quantizing ...
|
| 660 |
+
time 1.03
|
| 661 |
+
error 14789.1396484375
|
| 662 |
+
23 layer.0.SelfAttention.v
|
| 663 |
+
Quantizing ...
|
| 664 |
+
time 1.04
|
| 665 |
+
error 57604.91015625
|
| 666 |
+
23 layer.0.SelfAttention.o
|
| 667 |
+
Quantizing ...
|
| 668 |
+
time 1.02
|
| 669 |
+
error 2114846.25
|
| 670 |
+
23 layer.1.DenseReluDense.wi_0
|
| 671 |
+
Quantizing ...
|
| 672 |
+
time 1.05
|
| 673 |
+
error 41047.03125
|
| 674 |
+
23 layer.1.DenseReluDense.wi_1
|
| 675 |
+
Quantizing ...
|
| 676 |
+
time 1.04
|
| 677 |
+
error 83152.765625
|
| 678 |
+
23 layer.1.DenseReluDense.wo
|
| 679 |
+
Quantizing ...
|
| 680 |
+
time 2.75
|
| 681 |
+
error 13002426.0
|
| 682 |
+
728.4299275875092
|
| 683 |
+
Packing ...
|
| 684 |
+
encoder.block.0.layer.0.SelfAttention.q
|
| 685 |
+
encoder.block.0.layer.0.SelfAttention.k
|
| 686 |
+
encoder.block.0.layer.0.SelfAttention.v
|
| 687 |
+
encoder.block.0.layer.0.SelfAttention.o
|
| 688 |
+
encoder.block.0.layer.1.DenseReluDense.wi_0
|
| 689 |
+
encoder.block.0.layer.1.DenseReluDense.wi_1
|
| 690 |
+
encoder.block.0.layer.1.DenseReluDense.wo
|
| 691 |
+
encoder.block.1.layer.0.SelfAttention.q
|
| 692 |
+
encoder.block.1.layer.0.SelfAttention.k
|
| 693 |
+
encoder.block.1.layer.0.SelfAttention.v
|
| 694 |
+
encoder.block.1.layer.0.SelfAttention.o
|
| 695 |
+
encoder.block.1.layer.1.DenseReluDense.wi_0
|
| 696 |
+
encoder.block.1.layer.1.DenseReluDense.wi_1
|
| 697 |
+
encoder.block.1.layer.1.DenseReluDense.wo
|
| 698 |
+
encoder.block.2.layer.0.SelfAttention.q
|
| 699 |
+
encoder.block.2.layer.0.SelfAttention.k
|
| 700 |
+
encoder.block.2.layer.0.SelfAttention.v
|
| 701 |
+
encoder.block.2.layer.0.SelfAttention.o
|
| 702 |
+
encoder.block.2.layer.1.DenseReluDense.wi_0
|
| 703 |
+
encoder.block.2.layer.1.DenseReluDense.wi_1
|
| 704 |
+
encoder.block.2.layer.1.DenseReluDense.wo
|
| 705 |
+
encoder.block.3.layer.0.SelfAttention.q
|
| 706 |
+
encoder.block.3.layer.0.SelfAttention.k
|
| 707 |
+
encoder.block.3.layer.0.SelfAttention.v
|
| 708 |
+
encoder.block.3.layer.0.SelfAttention.o
|
| 709 |
+
encoder.block.3.layer.1.DenseReluDense.wi_0
|
| 710 |
+
encoder.block.3.layer.1.DenseReluDense.wi_1
|
| 711 |
+
encoder.block.3.layer.1.DenseReluDense.wo
|
| 712 |
+
encoder.block.4.layer.0.SelfAttention.q
|
| 713 |
+
encoder.block.4.layer.0.SelfAttention.k
|
| 714 |
+
encoder.block.4.layer.0.SelfAttention.v
|
| 715 |
+
encoder.block.4.layer.0.SelfAttention.o
|
| 716 |
+
encoder.block.4.layer.1.DenseReluDense.wi_0
|
| 717 |
+
encoder.block.4.layer.1.DenseReluDense.wi_1
|
| 718 |
+
encoder.block.4.layer.1.DenseReluDense.wo
|
| 719 |
+
encoder.block.5.layer.0.SelfAttention.q
|
| 720 |
+
encoder.block.5.layer.0.SelfAttention.k
|
| 721 |
+
encoder.block.5.layer.0.SelfAttention.v
|
| 722 |
+
encoder.block.5.layer.0.SelfAttention.o
|
| 723 |
+
encoder.block.5.layer.1.DenseReluDense.wi_0
|
| 724 |
+
encoder.block.5.layer.1.DenseReluDense.wi_1
|
| 725 |
+
encoder.block.5.layer.1.DenseReluDense.wo
|
| 726 |
+
encoder.block.6.layer.0.SelfAttention.q
|
| 727 |
+
encoder.block.6.layer.0.SelfAttention.k
|
| 728 |
+
encoder.block.6.layer.0.SelfAttention.v
|
| 729 |
+
encoder.block.6.layer.0.SelfAttention.o
|
| 730 |
+
encoder.block.6.layer.1.DenseReluDense.wi_0
|
| 731 |
+
encoder.block.6.layer.1.DenseReluDense.wi_1
|
| 732 |
+
encoder.block.6.layer.1.DenseReluDense.wo
|
| 733 |
+
encoder.block.7.layer.0.SelfAttention.q
|
| 734 |
+
encoder.block.7.layer.0.SelfAttention.k
|
| 735 |
+
encoder.block.7.layer.0.SelfAttention.v
|
| 736 |
+
encoder.block.7.layer.0.SelfAttention.o
|
| 737 |
+
encoder.block.7.layer.1.DenseReluDense.wi_0
|
| 738 |
+
encoder.block.7.layer.1.DenseReluDense.wi_1
|
| 739 |
+
encoder.block.7.layer.1.DenseReluDense.wo
|
| 740 |
+
encoder.block.8.layer.0.SelfAttention.q
|
| 741 |
+
encoder.block.8.layer.0.SelfAttention.k
|
| 742 |
+
encoder.block.8.layer.0.SelfAttention.v
|
| 743 |
+
encoder.block.8.layer.0.SelfAttention.o
|
| 744 |
+
encoder.block.8.layer.1.DenseReluDense.wi_0
|
| 745 |
+
encoder.block.8.layer.1.DenseReluDense.wi_1
|
| 746 |
+
encoder.block.8.layer.1.DenseReluDense.wo
|
| 747 |
+
encoder.block.9.layer.0.SelfAttention.q
|
| 748 |
+
encoder.block.9.layer.0.SelfAttention.k
|
| 749 |
+
encoder.block.9.layer.0.SelfAttention.v
|
| 750 |
+
encoder.block.9.layer.0.SelfAttention.o
|
| 751 |
+
encoder.block.9.layer.1.DenseReluDense.wi_0
|
| 752 |
+
encoder.block.9.layer.1.DenseReluDense.wi_1
|
| 753 |
+
encoder.block.9.layer.1.DenseReluDense.wo
|
| 754 |
+
encoder.block.10.layer.0.SelfAttention.q
|
| 755 |
+
encoder.block.10.layer.0.SelfAttention.k
|
| 756 |
+
encoder.block.10.layer.0.SelfAttention.v
|
| 757 |
+
encoder.block.10.layer.0.SelfAttention.o
|
| 758 |
+
encoder.block.10.layer.1.DenseReluDense.wi_0
|
| 759 |
+
encoder.block.10.layer.1.DenseReluDense.wi_1
|
| 760 |
+
encoder.block.10.layer.1.DenseReluDense.wo
|
| 761 |
+
encoder.block.11.layer.0.SelfAttention.q
|
| 762 |
+
encoder.block.11.layer.0.SelfAttention.k
|
| 763 |
+
encoder.block.11.layer.0.SelfAttention.v
|
| 764 |
+
encoder.block.11.layer.0.SelfAttention.o
|
| 765 |
+
encoder.block.11.layer.1.DenseReluDense.wi_0
|
| 766 |
+
encoder.block.11.layer.1.DenseReluDense.wi_1
|
| 767 |
+
encoder.block.11.layer.1.DenseReluDense.wo
|
| 768 |
+
encoder.block.12.layer.0.SelfAttention.q
|
| 769 |
+
encoder.block.12.layer.0.SelfAttention.k
|
| 770 |
+
encoder.block.12.layer.0.SelfAttention.v
|
| 771 |
+
encoder.block.12.layer.0.SelfAttention.o
|
| 772 |
+
encoder.block.12.layer.1.DenseReluDense.wi_0
|
| 773 |
+
encoder.block.12.layer.1.DenseReluDense.wi_1
|
| 774 |
+
encoder.block.12.layer.1.DenseReluDense.wo
|
| 775 |
+
encoder.block.13.layer.0.SelfAttention.q
|
| 776 |
+
encoder.block.13.layer.0.SelfAttention.k
|
| 777 |
+
encoder.block.13.layer.0.SelfAttention.v
|
| 778 |
+
encoder.block.13.layer.0.SelfAttention.o
|
| 779 |
+
encoder.block.13.layer.1.DenseReluDense.wi_0
|
| 780 |
+
encoder.block.13.layer.1.DenseReluDense.wi_1
|
| 781 |
+
encoder.block.13.layer.1.DenseReluDense.wo
|
| 782 |
+
encoder.block.14.layer.0.SelfAttention.q
|
| 783 |
+
encoder.block.14.layer.0.SelfAttention.k
|
| 784 |
+
encoder.block.14.layer.0.SelfAttention.v
|
| 785 |
+
encoder.block.14.layer.0.SelfAttention.o
|
| 786 |
+
encoder.block.14.layer.1.DenseReluDense.wi_0
|
| 787 |
+
encoder.block.14.layer.1.DenseReluDense.wi_1
|
| 788 |
+
encoder.block.14.layer.1.DenseReluDense.wo
|
| 789 |
+
encoder.block.15.layer.0.SelfAttention.q
|
| 790 |
+
encoder.block.15.layer.0.SelfAttention.k
|
| 791 |
+
encoder.block.15.layer.0.SelfAttention.v
|
| 792 |
+
encoder.block.15.layer.0.SelfAttention.o
|
| 793 |
+
encoder.block.15.layer.1.DenseReluDense.wi_0
|
| 794 |
+
encoder.block.15.layer.1.DenseReluDense.wi_1
|
| 795 |
+
encoder.block.15.layer.1.DenseReluDense.wo
|
| 796 |
+
encoder.block.16.layer.0.SelfAttention.q
|
| 797 |
+
encoder.block.16.layer.0.SelfAttention.k
|
| 798 |
+
encoder.block.16.layer.0.SelfAttention.v
|
| 799 |
+
encoder.block.16.layer.0.SelfAttention.o
|
| 800 |
+
encoder.block.16.layer.1.DenseReluDense.wi_0
|
| 801 |
+
encoder.block.16.layer.1.DenseReluDense.wi_1
|
| 802 |
+
encoder.block.16.layer.1.DenseReluDense.wo
|
| 803 |
+
encoder.block.17.layer.0.SelfAttention.q
|
| 804 |
+
encoder.block.17.layer.0.SelfAttention.k
|
| 805 |
+
encoder.block.17.layer.0.SelfAttention.v
|
| 806 |
+
encoder.block.17.layer.0.SelfAttention.o
|
| 807 |
+
encoder.block.17.layer.1.DenseReluDense.wi_0
|
| 808 |
+
encoder.block.17.layer.1.DenseReluDense.wi_1
|
| 809 |
+
encoder.block.17.layer.1.DenseReluDense.wo
|
| 810 |
+
encoder.block.18.layer.0.SelfAttention.q
|
| 811 |
+
encoder.block.18.layer.0.SelfAttention.k
|
| 812 |
+
encoder.block.18.layer.0.SelfAttention.v
|
| 813 |
+
encoder.block.18.layer.0.SelfAttention.o
|
| 814 |
+
encoder.block.18.layer.1.DenseReluDense.wi_0
|
| 815 |
+
encoder.block.18.layer.1.DenseReluDense.wi_1
|
| 816 |
+
encoder.block.18.layer.1.DenseReluDense.wo
|
| 817 |
+
encoder.block.19.layer.0.SelfAttention.q
|
| 818 |
+
encoder.block.19.layer.0.SelfAttention.k
|
| 819 |
+
encoder.block.19.layer.0.SelfAttention.v
|
| 820 |
+
encoder.block.19.layer.0.SelfAttention.o
|
| 821 |
+
encoder.block.19.layer.1.DenseReluDense.wi_0
|
| 822 |
+
encoder.block.19.layer.1.DenseReluDense.wi_1
|
| 823 |
+
encoder.block.19.layer.1.DenseReluDense.wo
|
| 824 |
+
encoder.block.20.layer.0.SelfAttention.q
|
| 825 |
+
encoder.block.20.layer.0.SelfAttention.k
|
| 826 |
+
encoder.block.20.layer.0.SelfAttention.v
|
| 827 |
+
encoder.block.20.layer.0.SelfAttention.o
|
| 828 |
+
encoder.block.20.layer.1.DenseReluDense.wi_0
|
| 829 |
+
encoder.block.20.layer.1.DenseReluDense.wi_1
|
| 830 |
+
encoder.block.20.layer.1.DenseReluDense.wo
|
| 831 |
+
encoder.block.21.layer.0.SelfAttention.q
|
| 832 |
+
encoder.block.21.layer.0.SelfAttention.k
|
| 833 |
+
encoder.block.21.layer.0.SelfAttention.v
|
| 834 |
+
encoder.block.21.layer.0.SelfAttention.o
|
| 835 |
+
encoder.block.21.layer.1.DenseReluDense.wi_0
|
| 836 |
+
encoder.block.21.layer.1.DenseReluDense.wi_1
|
| 837 |
+
encoder.block.21.layer.1.DenseReluDense.wo
|
| 838 |
+
encoder.block.22.layer.0.SelfAttention.q
|
| 839 |
+
encoder.block.22.layer.0.SelfAttention.k
|
| 840 |
+
encoder.block.22.layer.0.SelfAttention.v
|
| 841 |
+
encoder.block.22.layer.0.SelfAttention.o
|
| 842 |
+
encoder.block.22.layer.1.DenseReluDense.wi_0
|
| 843 |
+
encoder.block.22.layer.1.DenseReluDense.wi_1
|
| 844 |
+
encoder.block.22.layer.1.DenseReluDense.wo
|
| 845 |
+
encoder.block.23.layer.0.SelfAttention.q
|
| 846 |
+
encoder.block.23.layer.0.SelfAttention.k
|
| 847 |
+
encoder.block.23.layer.0.SelfAttention.v
|
| 848 |
+
encoder.block.23.layer.0.SelfAttention.o
|
| 849 |
+
encoder.block.23.layer.1.DenseReluDense.wi_0
|
| 850 |
+
encoder.block.23.layer.1.DenseReluDense.wi_1
|
| 851 |
+
encoder.block.23.layer.1.DenseReluDense.wo
|
| 852 |
+
Done.
|
| 853 |
+
|