| /usr/local/lib/python3.10/dist-packages/lightning_fabric/connector.py:558: `precision=16` is supported for historical reasons but its usage is discouraged. Please set your precision to 16-mixed instead! | |
| INFO:pytorch_lightning.utilities.rank_zero:Using 16bit Automatic Mixed Precision (AMP) | |
| INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True | |
| INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores | |
| INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs | |
| INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs | |
| /usr/local/lib/python3.10/dist-packages/pytorch_lightning/loggers/wandb.py:389: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`. | |
| INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | |
| INFO:pytorch_lightning.callbacks.model_summary: | |
| | Name | Type | Params | |
| ----------------------------------------------------------------- | |
| 0 | train_acc | MulticlassAccuracy | 0 | |
| 1 | valid_acc | MulticlassAccuracy | 0 | |
| 2 | test_acc | MulticlassAccuracy | 0 | |
| 3 | val_f1_score | MulticlassF1Score | 0 | |
| 4 | train_f1_score | MulticlassF1Score | 0 | |
| 5 | test_f1_score | MulticlassF1Score | 0 | |
| 6 | confusion_matrix | MulticlassConfusionMatrix | 0 | |
| 7 | gcn | SGCN | 36.5 K | |
| 8 | encoder | MoE_TransformerGraphEncoder | 6.8 M | |
| 9 | out | Sequential | 18.6 K | |
| ----------------------------------------------------------------- | |
| 6.9 M Trainable params | |
| 0 Non-trainable params | |
| 6.9 M Total params | |
| 27.527 Total estimated model params size (MB) | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved. New best score: 0.263 | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.135 >= min_delta = 1e-08. New best score: 0.398 | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.008 >= min_delta = 1e-08. New best score: 0.406 | |
| Epoch 00006: reducing learning rate of group 0 to 5.0000e-04. | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.471 >= min_delta = 1e-08. New best score: 0.877 | |
| Epoch 00010: reducing learning rate of group 0 to 2.5000e-04. | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.016 >= min_delta = 1e-08. New best score: 0.893 | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.016 >= min_delta = 1e-08. New best score: 0.909 | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.006 >= min_delta = 1e-08. New best score: 0.915 | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.006 >= min_delta = 1e-08. New best score: 0.920 | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.002 >= min_delta = 1e-08. New best score: 0.923 | |
| Epoch 00017: reducing learning rate of group 0 to 1.2500e-04. | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.002 >= min_delta = 1e-08. New best score: 0.925 | |
| Epoch 00020: reducing learning rate of group 0 to 6.2500e-05. | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.002 >= min_delta = 1e-08. New best score: 0.927 | |
| Epoch 00023: reducing learning rate of group 0 to 3.1250e-05. | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.003 >= min_delta = 1e-08. New best score: 0.930 | |
| Epoch 00026: reducing learning rate of group 0 to 1.5625e-05. | |
| INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.003 >= min_delta = 1e-08. New best score: 0.933 | |
| Epoch 00029: reducing learning rate of group 0 to 7.8125e-06. | |
| Epoch 00032: reducing learning rate of group 0 to 5.0000e-06. | |
| INFO:pytorch_lightning.callbacks.early_stopping:Monitored metric val_accuracy did not improve in the last 50 records. Best score: 0.933. Signaling Trainer to stop. | |
| model_args ModelArgs : ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8) | |
| model_args =: ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8) | |
| model_args ModelArgs : ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8) | |
| model_args =: ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8) | |
| model_args ModelArgs : ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8) | |
| model_args =: ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8) | |
| model_args ModelArgs : ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8) | |
| model_args =: ModelArgs(dim=128, hidden_dim=512, norm_eps=1e-05, moe=MoeArgs(num_experts=8, num_experts_per_tok=2), max_batch_size=32, max_seq_len=8) | |
| INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | |
| 0 -> ('', MoE_GCN( | |
| (train_acc): MulticlassAccuracy() | |
| (valid_acc): MulticlassAccuracy() | |
| (test_acc): MulticlassAccuracy() | |
| (val_f1_score): MulticlassF1Score() | |
| (train_f1_score): MulticlassF1Score() | |
| (test_f1_score): MulticlassF1Score() | |
| (confusion_matrix): MulticlassConfusionMatrix() | |
| (gcn): SGCN( | |
| (conv_layers): ModuleList( | |
| (0): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (1): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (2): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| ) | |
| ) | |
| (encoder): MoE_TransformerGraphEncoder( | |
| (layers): ModuleList( | |
| (0-3): 4 x MoE_TransformerGraphEncoderLayer( | |
| (attention): Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (feed_forward): Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| ) | |
| (positional_encoder): PositionalEncoder( | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| ) | |
| (out): Sequential( | |
| (0): Linear(in_features=128, out_features=128, bias=True) | |
| (1): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (2): Linear(in_features=128, out_features=14, bias=True) | |
| ) | |
| )) | |
| 1 -> ('train_acc', MulticlassAccuracy()) | |
| 2 -> ('valid_acc', MulticlassAccuracy()) | |
| 3 -> ('test_acc', MulticlassAccuracy()) | |
| 4 -> ('val_f1_score', MulticlassF1Score()) | |
| 5 -> ('train_f1_score', MulticlassF1Score()) | |
| 6 -> ('test_f1_score', MulticlassF1Score()) | |
| 7 -> ('confusion_matrix', MulticlassConfusionMatrix()) | |
| 8 -> ('gcn', SGCN( | |
| (conv_layers): ModuleList( | |
| (0): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (1): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (2): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| ) | |
| )) | |
| 9 -> ('gcn.conv_layers', ModuleList( | |
| (0): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (1): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (2): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| )) | |
| 10 -> ('gcn.conv_layers.0', unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| )) | |
| 11 -> ('gcn.conv_layers.0.conv_list', ModuleList( | |
| (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 12 -> ('gcn.conv_layers.0.conv_list.0', Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 13 -> ('gcn.conv_layers.0.conv_list.1', Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 14 -> ('gcn.conv_layers.0.conv_list.2', Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 15 -> ('gcn.conv_layers.0.bn', BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) | |
| 16 -> ('gcn.conv_layers.0.act', Mish()) | |
| 17 -> ('gcn.conv_layers.1', unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| )) | |
| 18 -> ('gcn.conv_layers.1.conv_list', ModuleList( | |
| (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 19 -> ('gcn.conv_layers.1.conv_list.0', Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))) | |
| 20 -> ('gcn.conv_layers.1.conv_list.1', Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))) | |
| 21 -> ('gcn.conv_layers.1.conv_list.2', Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))) | |
| 22 -> ('gcn.conv_layers.1.bn', BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) | |
| 23 -> ('gcn.conv_layers.1.act', Mish()) | |
| 24 -> ('gcn.conv_layers.2', unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| )) | |
| 25 -> ('gcn.conv_layers.2.conv_list', ModuleList( | |
| (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 26 -> ('gcn.conv_layers.2.conv_list.0', Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))) | |
| 27 -> ('gcn.conv_layers.2.conv_list.1', Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))) | |
| 28 -> ('gcn.conv_layers.2.conv_list.2', Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))) | |
| 29 -> ('gcn.conv_layers.2.bn', BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) | |
| 30 -> ('gcn.conv_layers.2.act', Mish()) | |
| 31 -> ('encoder', MoE_TransformerGraphEncoder( | |
| (layers): ModuleList( | |
| (0-3): 4 x MoE_TransformerGraphEncoderLayer( | |
| (attention): Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (feed_forward): Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| ) | |
| (positional_encoder): PositionalEncoder( | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| )) | |
| 32 -> ('encoder.layers', ModuleList( | |
| (0-3): 4 x MoE_TransformerGraphEncoderLayer( | |
| (attention): Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (feed_forward): Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| )) | |
| 33 -> ('encoder.layers.0', MoE_TransformerGraphEncoderLayer( | |
| (attention): Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (feed_forward): Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| )) | |
| 34 -> ('encoder.layers.0.attention', Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| )) | |
| 35 -> ('encoder.layers.0.attention.sublayer', MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| )) | |
| 36 -> ('encoder.layers.0.attention.sublayer.heads', ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| )) | |
| 37 -> ('encoder.layers.0.attention.sublayer.heads.0', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 38 -> ('encoder.layers.0.attention.sublayer.heads.0.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 39 -> ('encoder.layers.0.attention.sublayer.heads.0.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 40 -> ('encoder.layers.0.attention.sublayer.heads.0.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 41 -> ('encoder.layers.0.attention.sublayer.heads.1', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 42 -> ('encoder.layers.0.attention.sublayer.heads.1.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 43 -> ('encoder.layers.0.attention.sublayer.heads.1.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 44 -> ('encoder.layers.0.attention.sublayer.heads.1.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 45 -> ('encoder.layers.0.attention.sublayer.heads.2', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 46 -> ('encoder.layers.0.attention.sublayer.heads.2.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 47 -> ('encoder.layers.0.attention.sublayer.heads.2.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 48 -> ('encoder.layers.0.attention.sublayer.heads.2.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 49 -> ('encoder.layers.0.attention.sublayer.heads.3', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 50 -> ('encoder.layers.0.attention.sublayer.heads.3.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 51 -> ('encoder.layers.0.attention.sublayer.heads.3.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 52 -> ('encoder.layers.0.attention.sublayer.heads.3.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 53 -> ('encoder.layers.0.attention.sublayer.heads.4', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 54 -> ('encoder.layers.0.attention.sublayer.heads.4.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 55 -> ('encoder.layers.0.attention.sublayer.heads.4.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 56 -> ('encoder.layers.0.attention.sublayer.heads.4.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 57 -> ('encoder.layers.0.attention.sublayer.heads.5', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 58 -> ('encoder.layers.0.attention.sublayer.heads.5.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 59 -> ('encoder.layers.0.attention.sublayer.heads.5.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 60 -> ('encoder.layers.0.attention.sublayer.heads.5.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 61 -> ('encoder.layers.0.attention.sublayer.heads.6', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 62 -> ('encoder.layers.0.attention.sublayer.heads.6.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 63 -> ('encoder.layers.0.attention.sublayer.heads.6.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 64 -> ('encoder.layers.0.attention.sublayer.heads.6.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 65 -> ('encoder.layers.0.attention.sublayer.heads.7', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 66 -> ('encoder.layers.0.attention.sublayer.heads.7.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 67 -> ('encoder.layers.0.attention.sublayer.heads.7.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 68 -> ('encoder.layers.0.attention.sublayer.heads.7.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 69 -> ('encoder.layers.0.attention.sublayer.linear', Linear(in_features=256, out_features=128, bias=True)) | |
| 70 -> ('encoder.layers.0.attention.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 71 -> ('encoder.layers.0.attention.dropout', Dropout(p=0.1, inplace=False)) | |
| 72 -> ('encoder.layers.0.feed_forward', Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| )) | |
| 73 -> ('encoder.layers.0.feed_forward.sublayer', MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| )) | |
| 74 -> ('encoder.layers.0.feed_forward.sublayer.experts', ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| )) | |
| 75 -> ('encoder.layers.0.feed_forward.sublayer.experts.0', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 76 -> ('encoder.layers.0.feed_forward.sublayer.experts.0.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 77 -> ('encoder.layers.0.feed_forward.sublayer.experts.0.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 78 -> ('encoder.layers.0.feed_forward.sublayer.experts.0.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 79 -> ('encoder.layers.0.feed_forward.sublayer.experts.1', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 80 -> ('encoder.layers.0.feed_forward.sublayer.experts.1.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 81 -> ('encoder.layers.0.feed_forward.sublayer.experts.1.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 82 -> ('encoder.layers.0.feed_forward.sublayer.experts.1.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 83 -> ('encoder.layers.0.feed_forward.sublayer.experts.2', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 84 -> ('encoder.layers.0.feed_forward.sublayer.experts.2.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 85 -> ('encoder.layers.0.feed_forward.sublayer.experts.2.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 86 -> ('encoder.layers.0.feed_forward.sublayer.experts.2.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 87 -> ('encoder.layers.0.feed_forward.sublayer.experts.3', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 88 -> ('encoder.layers.0.feed_forward.sublayer.experts.3.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 89 -> ('encoder.layers.0.feed_forward.sublayer.experts.3.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 90 -> ('encoder.layers.0.feed_forward.sublayer.experts.3.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 91 -> ('encoder.layers.0.feed_forward.sublayer.experts.4', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 92 -> ('encoder.layers.0.feed_forward.sublayer.experts.4.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 93 -> ('encoder.layers.0.feed_forward.sublayer.experts.4.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 94 -> ('encoder.layers.0.feed_forward.sublayer.experts.4.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 95 -> ('encoder.layers.0.feed_forward.sublayer.experts.5', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 96 -> ('encoder.layers.0.feed_forward.sublayer.experts.5.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 97 -> ('encoder.layers.0.feed_forward.sublayer.experts.5.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 98 -> ('encoder.layers.0.feed_forward.sublayer.experts.5.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 99 -> ('encoder.layers.0.feed_forward.sublayer.experts.6', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 100 -> ('encoder.layers.0.feed_forward.sublayer.experts.6.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 101 -> ('encoder.layers.0.feed_forward.sublayer.experts.6.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 102 -> ('encoder.layers.0.feed_forward.sublayer.experts.6.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 103 -> ('encoder.layers.0.feed_forward.sublayer.experts.7', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 104 -> ('encoder.layers.0.feed_forward.sublayer.experts.7.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 105 -> ('encoder.layers.0.feed_forward.sublayer.experts.7.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 106 -> ('encoder.layers.0.feed_forward.sublayer.experts.7.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 107 -> ('encoder.layers.0.feed_forward.sublayer.gate', Linear(in_features=128, out_features=8, bias=False)) | |
| 108 -> ('encoder.layers.0.feed_forward.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 109 -> ('encoder.layers.0.feed_forward.dropout', Dropout(p=0.1, inplace=False)) | |
| 110 -> ('encoder.layers.0.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 111 -> ('encoder.layers.1', MoE_TransformerGraphEncoderLayer( | |
| (attention): Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (feed_forward): Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| )) | |
| 112 -> ('encoder.layers.1.attention', Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| )) | |
| 113 -> ('encoder.layers.1.attention.sublayer', MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| )) | |
| 114 -> ('encoder.layers.1.attention.sublayer.heads', ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| )) | |
| 115 -> ('encoder.layers.1.attention.sublayer.heads.0', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 116 -> ('encoder.layers.1.attention.sublayer.heads.0.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 117 -> ('encoder.layers.1.attention.sublayer.heads.0.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 118 -> ('encoder.layers.1.attention.sublayer.heads.0.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 119 -> ('encoder.layers.1.attention.sublayer.heads.1', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 120 -> ('encoder.layers.1.attention.sublayer.heads.1.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 121 -> ('encoder.layers.1.attention.sublayer.heads.1.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 122 -> ('encoder.layers.1.attention.sublayer.heads.1.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 123 -> ('encoder.layers.1.attention.sublayer.heads.2', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 124 -> ('encoder.layers.1.attention.sublayer.heads.2.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 125 -> ('encoder.layers.1.attention.sublayer.heads.2.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 126 -> ('encoder.layers.1.attention.sublayer.heads.2.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 127 -> ('encoder.layers.1.attention.sublayer.heads.3', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 128 -> ('encoder.layers.1.attention.sublayer.heads.3.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 129 -> ('encoder.layers.1.attention.sublayer.heads.3.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 130 -> ('encoder.layers.1.attention.sublayer.heads.3.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 131 -> ('encoder.layers.1.attention.sublayer.heads.4', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 132 -> ('encoder.layers.1.attention.sublayer.heads.4.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 133 -> ('encoder.layers.1.attention.sublayer.heads.4.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 134 -> ('encoder.layers.1.attention.sublayer.heads.4.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 135 -> ('encoder.layers.1.attention.sublayer.heads.5', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 136 -> ('encoder.layers.1.attention.sublayer.heads.5.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 137 -> ('encoder.layers.1.attention.sublayer.heads.5.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 138 -> ('encoder.layers.1.attention.sublayer.heads.5.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 139 -> ('encoder.layers.1.attention.sublayer.heads.6', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 140 -> ('encoder.layers.1.attention.sublayer.heads.6.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 141 -> ('encoder.layers.1.attention.sublayer.heads.6.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 142 -> ('encoder.layers.1.attention.sublayer.heads.6.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 143 -> ('encoder.layers.1.attention.sublayer.heads.7', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 144 -> ('encoder.layers.1.attention.sublayer.heads.7.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 145 -> ('encoder.layers.1.attention.sublayer.heads.7.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 146 -> ('encoder.layers.1.attention.sublayer.heads.7.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 147 -> ('encoder.layers.1.attention.sublayer.linear', Linear(in_features=256, out_features=128, bias=True)) | |
| 148 -> ('encoder.layers.1.attention.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 149 -> ('encoder.layers.1.attention.dropout', Dropout(p=0.1, inplace=False)) | |
| 150 -> ('encoder.layers.1.feed_forward', Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| )) | |
| 151 -> ('encoder.layers.1.feed_forward.sublayer', MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| )) | |
| 152 -> ('encoder.layers.1.feed_forward.sublayer.experts', ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| )) | |
| 153 -> ('encoder.layers.1.feed_forward.sublayer.experts.0', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 154 -> ('encoder.layers.1.feed_forward.sublayer.experts.0.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 155 -> ('encoder.layers.1.feed_forward.sublayer.experts.0.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 156 -> ('encoder.layers.1.feed_forward.sublayer.experts.0.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 157 -> ('encoder.layers.1.feed_forward.sublayer.experts.1', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 158 -> ('encoder.layers.1.feed_forward.sublayer.experts.1.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 159 -> ('encoder.layers.1.feed_forward.sublayer.experts.1.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 160 -> ('encoder.layers.1.feed_forward.sublayer.experts.1.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 161 -> ('encoder.layers.1.feed_forward.sublayer.experts.2', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 162 -> ('encoder.layers.1.feed_forward.sublayer.experts.2.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 163 -> ('encoder.layers.1.feed_forward.sublayer.experts.2.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 164 -> ('encoder.layers.1.feed_forward.sublayer.experts.2.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 165 -> ('encoder.layers.1.feed_forward.sublayer.experts.3', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 166 -> ('encoder.layers.1.feed_forward.sublayer.experts.3.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 167 -> ('encoder.layers.1.feed_forward.sublayer.experts.3.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 168 -> ('encoder.layers.1.feed_forward.sublayer.experts.3.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 169 -> ('encoder.layers.1.feed_forward.sublayer.experts.4', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 170 -> ('encoder.layers.1.feed_forward.sublayer.experts.4.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 171 -> ('encoder.layers.1.feed_forward.sublayer.experts.4.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 172 -> ('encoder.layers.1.feed_forward.sublayer.experts.4.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 173 -> ('encoder.layers.1.feed_forward.sublayer.experts.5', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 174 -> ('encoder.layers.1.feed_forward.sublayer.experts.5.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 175 -> ('encoder.layers.1.feed_forward.sublayer.experts.5.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 176 -> ('encoder.layers.1.feed_forward.sublayer.experts.5.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 177 -> ('encoder.layers.1.feed_forward.sublayer.experts.6', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 178 -> ('encoder.layers.1.feed_forward.sublayer.experts.6.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 179 -> ('encoder.layers.1.feed_forward.sublayer.experts.6.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 180 -> ('encoder.layers.1.feed_forward.sublayer.experts.6.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 181 -> ('encoder.layers.1.feed_forward.sublayer.experts.7', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 182 -> ('encoder.layers.1.feed_forward.sublayer.experts.7.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 183 -> ('encoder.layers.1.feed_forward.sublayer.experts.7.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 184 -> ('encoder.layers.1.feed_forward.sublayer.experts.7.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 185 -> ('encoder.layers.1.feed_forward.sublayer.gate', Linear(in_features=128, out_features=8, bias=False)) | |
| 186 -> ('encoder.layers.1.feed_forward.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 187 -> ('encoder.layers.1.feed_forward.dropout', Dropout(p=0.1, inplace=False)) | |
| 188 -> ('encoder.layers.1.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 189 -> ('encoder.layers.2', MoE_TransformerGraphEncoderLayer( | |
| (attention): Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (feed_forward): Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| )) | |
| 190 -> ('encoder.layers.2.attention', Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| )) | |
| 191 -> ('encoder.layers.2.attention.sublayer', MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| )) | |
| 192 -> ('encoder.layers.2.attention.sublayer.heads', ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| )) | |
| 193 -> ('encoder.layers.2.attention.sublayer.heads.0', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 194 -> ('encoder.layers.2.attention.sublayer.heads.0.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 195 -> ('encoder.layers.2.attention.sublayer.heads.0.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 196 -> ('encoder.layers.2.attention.sublayer.heads.0.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 197 -> ('encoder.layers.2.attention.sublayer.heads.1', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 198 -> ('encoder.layers.2.attention.sublayer.heads.1.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 199 -> ('encoder.layers.2.attention.sublayer.heads.1.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 200 -> ('encoder.layers.2.attention.sublayer.heads.1.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 201 -> ('encoder.layers.2.attention.sublayer.heads.2', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 202 -> ('encoder.layers.2.attention.sublayer.heads.2.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 203 -> ('encoder.layers.2.attention.sublayer.heads.2.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 204 -> ('encoder.layers.2.attention.sublayer.heads.2.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 205 -> ('encoder.layers.2.attention.sublayer.heads.3', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 206 -> ('encoder.layers.2.attention.sublayer.heads.3.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 207 -> ('encoder.layers.2.attention.sublayer.heads.3.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 208 -> ('encoder.layers.2.attention.sublayer.heads.3.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 209 -> ('encoder.layers.2.attention.sublayer.heads.4', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 210 -> ('encoder.layers.2.attention.sublayer.heads.4.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 211 -> ('encoder.layers.2.attention.sublayer.heads.4.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 212 -> ('encoder.layers.2.attention.sublayer.heads.4.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 213 -> ('encoder.layers.2.attention.sublayer.heads.5', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 214 -> ('encoder.layers.2.attention.sublayer.heads.5.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 215 -> ('encoder.layers.2.attention.sublayer.heads.5.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 216 -> ('encoder.layers.2.attention.sublayer.heads.5.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 217 -> ('encoder.layers.2.attention.sublayer.heads.6', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 218 -> ('encoder.layers.2.attention.sublayer.heads.6.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 219 -> ('encoder.layers.2.attention.sublayer.heads.6.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 220 -> ('encoder.layers.2.attention.sublayer.heads.6.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 221 -> ('encoder.layers.2.attention.sublayer.heads.7', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 222 -> ('encoder.layers.2.attention.sublayer.heads.7.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 223 -> ('encoder.layers.2.attention.sublayer.heads.7.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 224 -> ('encoder.layers.2.attention.sublayer.heads.7.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 225 -> ('encoder.layers.2.attention.sublayer.linear', Linear(in_features=256, out_features=128, bias=True)) | |
| 226 -> ('encoder.layers.2.attention.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 227 -> ('encoder.layers.2.attention.dropout', Dropout(p=0.1, inplace=False)) | |
| 228 -> ('encoder.layers.2.feed_forward', Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| )) | |
| 229 -> ('encoder.layers.2.feed_forward.sublayer', MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| )) | |
| 230 -> ('encoder.layers.2.feed_forward.sublayer.experts', ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| )) | |
| 231 -> ('encoder.layers.2.feed_forward.sublayer.experts.0', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 232 -> ('encoder.layers.2.feed_forward.sublayer.experts.0.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 233 -> ('encoder.layers.2.feed_forward.sublayer.experts.0.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 234 -> ('encoder.layers.2.feed_forward.sublayer.experts.0.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 235 -> ('encoder.layers.2.feed_forward.sublayer.experts.1', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 236 -> ('encoder.layers.2.feed_forward.sublayer.experts.1.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 237 -> ('encoder.layers.2.feed_forward.sublayer.experts.1.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 238 -> ('encoder.layers.2.feed_forward.sublayer.experts.1.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 239 -> ('encoder.layers.2.feed_forward.sublayer.experts.2', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 240 -> ('encoder.layers.2.feed_forward.sublayer.experts.2.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 241 -> ('encoder.layers.2.feed_forward.sublayer.experts.2.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 242 -> ('encoder.layers.2.feed_forward.sublayer.experts.2.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 243 -> ('encoder.layers.2.feed_forward.sublayer.experts.3', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 244 -> ('encoder.layers.2.feed_forward.sublayer.experts.3.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 245 -> ('encoder.layers.2.feed_forward.sublayer.experts.3.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 246 -> ('encoder.layers.2.feed_forward.sublayer.experts.3.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 247 -> ('encoder.layers.2.feed_forward.sublayer.experts.4', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 248 -> ('encoder.layers.2.feed_forward.sublayer.experts.4.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 249 -> ('encoder.layers.2.feed_forward.sublayer.experts.4.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 250 -> ('encoder.layers.2.feed_forward.sublayer.experts.4.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 251 -> ('encoder.layers.2.feed_forward.sublayer.experts.5', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 252 -> ('encoder.layers.2.feed_forward.sublayer.experts.5.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 253 -> ('encoder.layers.2.feed_forward.sublayer.experts.5.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 254 -> ('encoder.layers.2.feed_forward.sublayer.experts.5.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 255 -> ('encoder.layers.2.feed_forward.sublayer.experts.6', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 256 -> ('encoder.layers.2.feed_forward.sublayer.experts.6.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 257 -> ('encoder.layers.2.feed_forward.sublayer.experts.6.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 258 -> ('encoder.layers.2.feed_forward.sublayer.experts.6.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 259 -> ('encoder.layers.2.feed_forward.sublayer.experts.7', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 260 -> ('encoder.layers.2.feed_forward.sublayer.experts.7.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 261 -> ('encoder.layers.2.feed_forward.sublayer.experts.7.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 262 -> ('encoder.layers.2.feed_forward.sublayer.experts.7.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 263 -> ('encoder.layers.2.feed_forward.sublayer.gate', Linear(in_features=128, out_features=8, bias=False)) | |
| 264 -> ('encoder.layers.2.feed_forward.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 265 -> ('encoder.layers.2.feed_forward.dropout', Dropout(p=0.1, inplace=False)) | |
| 266 -> ('encoder.layers.2.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 267 -> ('encoder.layers.3', MoE_TransformerGraphEncoderLayer( | |
| (attention): Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (feed_forward): Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| )) | |
| 268 -> ('encoder.layers.3.attention', Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| )) | |
| 269 -> ('encoder.layers.3.attention.sublayer', MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| )) | |
| 270 -> ('encoder.layers.3.attention.sublayer.heads', ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| )) | |
| 271 -> ('encoder.layers.3.attention.sublayer.heads.0', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 272 -> ('encoder.layers.3.attention.sublayer.heads.0.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 273 -> ('encoder.layers.3.attention.sublayer.heads.0.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 274 -> ('encoder.layers.3.attention.sublayer.heads.0.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 275 -> ('encoder.layers.3.attention.sublayer.heads.1', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 276 -> ('encoder.layers.3.attention.sublayer.heads.1.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 277 -> ('encoder.layers.3.attention.sublayer.heads.1.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 278 -> ('encoder.layers.3.attention.sublayer.heads.1.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 279 -> ('encoder.layers.3.attention.sublayer.heads.2', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 280 -> ('encoder.layers.3.attention.sublayer.heads.2.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 281 -> ('encoder.layers.3.attention.sublayer.heads.2.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 282 -> ('encoder.layers.3.attention.sublayer.heads.2.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 283 -> ('encoder.layers.3.attention.sublayer.heads.3', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 284 -> ('encoder.layers.3.attention.sublayer.heads.3.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 285 -> ('encoder.layers.3.attention.sublayer.heads.3.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 286 -> ('encoder.layers.3.attention.sublayer.heads.3.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 287 -> ('encoder.layers.3.attention.sublayer.heads.4', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 288 -> ('encoder.layers.3.attention.sublayer.heads.4.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 289 -> ('encoder.layers.3.attention.sublayer.heads.4.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 290 -> ('encoder.layers.3.attention.sublayer.heads.4.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 291 -> ('encoder.layers.3.attention.sublayer.heads.5', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 292 -> ('encoder.layers.3.attention.sublayer.heads.5.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 293 -> ('encoder.layers.3.attention.sublayer.heads.5.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 294 -> ('encoder.layers.3.attention.sublayer.heads.5.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 295 -> ('encoder.layers.3.attention.sublayer.heads.6', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 296 -> ('encoder.layers.3.attention.sublayer.heads.6.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 297 -> ('encoder.layers.3.attention.sublayer.heads.6.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 298 -> ('encoder.layers.3.attention.sublayer.heads.6.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 299 -> ('encoder.layers.3.attention.sublayer.heads.7', AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| )) | |
| 300 -> ('encoder.layers.3.attention.sublayer.heads.7.q_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 301 -> ('encoder.layers.3.attention.sublayer.heads.7.k_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 302 -> ('encoder.layers.3.attention.sublayer.heads.7.v_conv', Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))) | |
| 303 -> ('encoder.layers.3.attention.sublayer.linear', Linear(in_features=256, out_features=128, bias=True)) | |
| 304 -> ('encoder.layers.3.attention.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 305 -> ('encoder.layers.3.attention.dropout', Dropout(p=0.1, inplace=False)) | |
| 306 -> ('encoder.layers.3.feed_forward', Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| )) | |
| 307 -> ('encoder.layers.3.feed_forward.sublayer', MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| )) | |
| 308 -> ('encoder.layers.3.feed_forward.sublayer.experts', ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| )) | |
| 309 -> ('encoder.layers.3.feed_forward.sublayer.experts.0', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 310 -> ('encoder.layers.3.feed_forward.sublayer.experts.0.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 311 -> ('encoder.layers.3.feed_forward.sublayer.experts.0.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 312 -> ('encoder.layers.3.feed_forward.sublayer.experts.0.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 313 -> ('encoder.layers.3.feed_forward.sublayer.experts.1', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 314 -> ('encoder.layers.3.feed_forward.sublayer.experts.1.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 315 -> ('encoder.layers.3.feed_forward.sublayer.experts.1.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 316 -> ('encoder.layers.3.feed_forward.sublayer.experts.1.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 317 -> ('encoder.layers.3.feed_forward.sublayer.experts.2', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 318 -> ('encoder.layers.3.feed_forward.sublayer.experts.2.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 319 -> ('encoder.layers.3.feed_forward.sublayer.experts.2.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 320 -> ('encoder.layers.3.feed_forward.sublayer.experts.2.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 321 -> ('encoder.layers.3.feed_forward.sublayer.experts.3', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 322 -> ('encoder.layers.3.feed_forward.sublayer.experts.3.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 323 -> ('encoder.layers.3.feed_forward.sublayer.experts.3.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 324 -> ('encoder.layers.3.feed_forward.sublayer.experts.3.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 325 -> ('encoder.layers.3.feed_forward.sublayer.experts.4', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 326 -> ('encoder.layers.3.feed_forward.sublayer.experts.4.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 327 -> ('encoder.layers.3.feed_forward.sublayer.experts.4.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 328 -> ('encoder.layers.3.feed_forward.sublayer.experts.4.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 329 -> ('encoder.layers.3.feed_forward.sublayer.experts.5', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 330 -> ('encoder.layers.3.feed_forward.sublayer.experts.5.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 331 -> ('encoder.layers.3.feed_forward.sublayer.experts.5.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 332 -> ('encoder.layers.3.feed_forward.sublayer.experts.5.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 333 -> ('encoder.layers.3.feed_forward.sublayer.experts.6', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 334 -> ('encoder.layers.3.feed_forward.sublayer.experts.6.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 335 -> ('encoder.layers.3.feed_forward.sublayer.experts.6.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 336 -> ('encoder.layers.3.feed_forward.sublayer.experts.6.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 337 -> ('encoder.layers.3.feed_forward.sublayer.experts.7', FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| )) | |
| 338 -> ('encoder.layers.3.feed_forward.sublayer.experts.7.w1', Linear(in_features=128, out_features=512, bias=False)) | |
| 339 -> ('encoder.layers.3.feed_forward.sublayer.experts.7.w2', Linear(in_features=512, out_features=128, bias=False)) | |
| 340 -> ('encoder.layers.3.feed_forward.sublayer.experts.7.w3', Linear(in_features=128, out_features=512, bias=False)) | |
| 341 -> ('encoder.layers.3.feed_forward.sublayer.gate', Linear(in_features=128, out_features=8, bias=False)) | |
| 342 -> ('encoder.layers.3.feed_forward.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 343 -> ('encoder.layers.3.feed_forward.dropout', Dropout(p=0.1, inplace=False)) | |
| 344 -> ('encoder.layers.3.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 345 -> ('encoder.positional_encoder', PositionalEncoder( | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| )) | |
| 346 -> ('encoder.positional_encoder.norm', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 347 -> ('out', Sequential( | |
| (0): Linear(in_features=128, out_features=128, bias=True) | |
| (1): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (2): Linear(in_features=128, out_features=14, bias=True) | |
| )) | |
| 348 -> ('out.0', Linear(in_features=128, out_features=128, bias=True)) | |
| 349 -> ('out.1', LayerNorm((128,), eps=1e-05, elementwise_affine=True)) | |
| 350 -> ('out.2', Linear(in_features=128, out_features=14, bias=True)) | |
| Counting the model summary and the Number of parameters MoE_GCN model | |
| model_summary : | |
| model_summary | |
| Layer_name Number of Parameters | |
| ==================================================================================================== | |
| MulticlassAccuracy() 1548 | |
| MulticlassAccuracy() 128 | |
| MulticlassAccuracy() 128 | |
| MulticlassF1Score() 64 | |
| MulticlassF1Score() 1484 | |
| MulticlassF1Score() 2112 | |
| MulticlassConfusionMatrix() 2112 | |
| SGCN( | |
| (conv_layers): ModuleList( | |
| (0): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (1): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (2): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| ) | |
| ) 2112 | |
| MoE_TransformerGraphEncoder( | |
| (layers): ModuleList( | |
| (0-3): 4 x MoE_TransformerGraphEncoderLayer( | |
| (attention): Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (feed_forward): Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| ) | |
| (positional_encoder): PositionalEncoder( | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| ) 128 | |
| Sequential( | |
| (0): Linear(in_features=128, out_features=128, bias=True) | |
| (1): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (2): Linear(in_features=128, out_features=14, bias=True) | |
| ) 9644 | |
| ==================================================================================================== | |
| Total Params:19460 | |
| model_summary | |
| Layer_name Number of Parameters | |
| ==================================================================================================== | |
| MulticlassAccuracy() 1548 | |
| MulticlassAccuracy() 128 | |
| MulticlassAccuracy() 128 | |
| MulticlassF1Score() 64 | |
| MulticlassF1Score() 1484 | |
| MulticlassF1Score() 2112 | |
| MulticlassConfusionMatrix() 2112 | |
| SGCN( | |
| (conv_layers): ModuleList( | |
| (0): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(3, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (1): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| (2): unit_gcn( | |
| (conv_list): ModuleList( | |
| (0-2): 3 x Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) | |
| (act): Mish() | |
| ) | |
| ) | |
| ) 2112 | |
| MoE_TransformerGraphEncoder( | |
| (layers): ModuleList( | |
| (0-3): 4 x MoE_TransformerGraphEncoderLayer( | |
| (attention): Residual( | |
| (sublayer): MultiHeadAttention( | |
| (heads): ModuleList( | |
| (0-7): 8 x AttentionHead( | |
| (q_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (k_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| (v_conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1)) | |
| ) | |
| ) | |
| (linear): Linear(in_features=256, out_features=128, bias=True) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (feed_forward): Residual( | |
| (sublayer): MoeLayer( | |
| (experts): ModuleList( | |
| (0-7): 8 x FeedForward( | |
| (w1): Linear(in_features=128, out_features=512, bias=False) | |
| (w2): Linear(in_features=512, out_features=128, bias=False) | |
| (w3): Linear(in_features=128, out_features=512, bias=False) | |
| ) | |
| ) | |
| (gate): Linear(in_features=128, out_features=8, bias=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| ) | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| ) | |
| (positional_encoder): PositionalEncoder( | |
| (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| ) 128 | |
| Sequential( | |
| (0): Linear(in_features=128, out_features=128, bias=True) | |
| (1): LayerNorm((128,), eps=1e-05, elementwise_affine=True) | |
| (2): Linear(in_features=128, out_features=14, bias=True) | |
| ) 9644 | |
| ==================================================================================================== | |
| Total Params:19460 | |
| Counting the parameters MoE_GCN model | |
| +------------------------------------------------------------+------------+ | |
| | Modules | Parameters | | |
| +------------------------------------------------------------+------------+ | |
| | gcn.conv_layers.0.mask | 1452 | | |
| | gcn.conv_layers.0.conv_list.0.weight | 96 | | |
| | gcn.conv_layers.0.conv_list.0.bias | 32 | | |
| | gcn.conv_layers.0.conv_list.1.weight | 96 | | |
| | gcn.conv_layers.0.conv_list.1.bias | 32 | | |
| | gcn.conv_layers.0.conv_list.2.weight | 96 | | |
| | gcn.conv_layers.0.conv_list.2.bias | 32 | | |
| | gcn.conv_layers.0.bn.weight | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.1.mask | 1452 | | |
| | gcn.conv_layers.1.conv_list.0.weight | 2048 | | |
| | gcn.conv_layers.1.conv_list.0.bias | 64 | | |
| | gcn.conv_layers.1.conv_list.1.weight | 2048 | | |
| | gcn.conv_layers.1.conv_list.1.bias | 64 | | |
| | gcn.conv_layers.1.conv_list.2.weight | 2048 | | |
| | gcn.conv_layers.1.conv_list.2.bias | 64 | | |
| | gcn.conv_layers.1.bn.weight | 64 | | |
| | gcn.conv_layers.1.bn.bias | 64 | | |
| | gcn.conv_layers.2.mask | 1452 | | |
| | gcn.conv_layers.2.conv_list.0.weight | 8192 | | |
| | gcn.conv_layers.2.conv_list.0.bias | 128 | | |
| | gcn.conv_layers.2.conv_list.1.weight | 8192 | | |
| | gcn.conv_layers.2.conv_list.1.bias | 128 | | |
| | gcn.conv_layers.2.conv_list.2.weight | 8192 | | |
| | gcn.conv_layers.2.conv_list.2.bias | 128 | | |
| | gcn.conv_layers.2.bn.weight | 128 | | |
| | gcn.conv_layers.2.bn.bias | 128 | | |
| | encoder.layers.0.attention.sublayer.heads.0.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.0.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.0.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.0.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.0.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.0.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.1.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.1.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.1.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.1.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.1.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.1.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.2.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.2.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.2.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.2.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.2.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.2.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.3.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.3.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.3.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.3.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.3.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.3.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.4.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.4.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.4.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.4.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.4.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.4.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.5.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.5.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.5.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.5.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.5.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.5.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.6.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.6.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.6.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.6.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.6.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.6.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.7.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.7.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.7.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.7.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.7.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.7.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.linear.weight | 32768 | | |
| | encoder.layers.0.attention.sublayer.linear.bias | 128 | | |
| | encoder.layers.0.attention.norm.weight | 128 | | |
| | encoder.layers.0.attention.norm.bias | 128 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.0.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.0.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.0.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.1.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.1.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.1.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.2.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.2.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.2.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.3.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.3.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.3.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.4.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.4.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.4.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.5.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.5.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.5.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.6.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.6.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.6.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.7.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.7.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.7.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.gate.weight | 1024 | | |
| | encoder.layers.0.feed_forward.norm.weight | 128 | | |
| | encoder.layers.0.feed_forward.norm.bias | 128 | | |
| | encoder.layers.0.norm.weight | 128 | | |
| | encoder.layers.0.norm.bias | 128 | | |
| | encoder.layers.1.attention.sublayer.heads.0.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.0.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.0.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.0.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.0.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.0.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.1.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.1.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.1.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.1.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.1.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.1.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.2.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.2.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.2.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.2.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.2.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.2.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.3.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.3.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.3.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.3.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.3.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.3.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.4.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.4.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.4.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.4.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.4.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.4.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.5.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.5.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.5.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.5.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.5.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.5.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.6.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.6.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.6.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.6.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.6.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.6.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.7.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.7.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.7.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.7.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.7.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.7.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.linear.weight | 32768 | | |
| | encoder.layers.1.attention.sublayer.linear.bias | 128 | | |
| | encoder.layers.1.attention.norm.weight | 128 | | |
| | encoder.layers.1.attention.norm.bias | 128 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.0.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.0.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.0.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.1.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.1.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.1.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.2.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.2.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.2.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.3.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.3.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.3.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.4.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.4.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.4.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.5.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.5.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.5.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.6.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.6.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.6.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.7.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.7.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.7.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.gate.weight | 1024 | | |
| | encoder.layers.1.feed_forward.norm.weight | 128 | | |
| | encoder.layers.1.feed_forward.norm.bias | 128 | | |
| | encoder.layers.1.norm.weight | 128 | | |
| | encoder.layers.1.norm.bias | 128 | | |
| | encoder.layers.2.attention.sublayer.heads.0.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.0.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.0.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.0.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.0.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.0.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.1.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.1.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.1.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.1.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.1.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.1.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.2.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.2.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.2.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.2.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.2.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.2.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.3.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.3.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.3.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.3.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.3.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.3.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.4.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.4.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.4.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.4.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.4.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.4.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.5.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.5.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.5.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.5.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.5.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.5.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.6.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.6.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.6.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.6.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.6.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.6.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.7.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.7.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.7.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.7.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.7.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.7.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.linear.weight | 32768 | | |
| | encoder.layers.2.attention.sublayer.linear.bias | 128 | | |
| | encoder.layers.2.attention.norm.weight | 128 | | |
| | encoder.layers.2.attention.norm.bias | 128 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.0.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.0.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.0.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.1.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.1.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.1.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.2.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.2.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.2.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.3.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.3.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.3.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.4.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.4.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.4.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.5.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.5.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.5.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.6.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.6.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.6.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.7.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.7.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.7.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.gate.weight | 1024 | | |
| | encoder.layers.2.feed_forward.norm.weight | 128 | | |
| | encoder.layers.2.feed_forward.norm.bias | 128 | | |
| | encoder.layers.2.norm.weight | 128 | | |
| | encoder.layers.2.norm.bias | 128 | | |
| | encoder.layers.3.attention.sublayer.heads.0.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.0.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.0.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.0.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.0.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.0.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.1.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.1.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.1.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.1.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.1.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.1.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.2.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.2.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.2.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.2.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.2.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.2.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.3.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.3.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.3.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.3.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.3.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.3.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.4.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.4.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.4.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.4.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.4.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.4.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.5.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.5.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.5.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.5.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.5.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.5.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.6.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.6.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.6.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.6.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.6.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.6.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.7.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.7.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.7.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.7.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.7.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.7.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.linear.weight | 32768 | | |
| | encoder.layers.3.attention.sublayer.linear.bias | 128 | | |
| | encoder.layers.3.attention.norm.weight | 128 | | |
| | encoder.layers.3.attention.norm.bias | 128 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.0.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.0.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.0.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.1.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.1.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.1.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.2.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.2.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.2.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.3.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.3.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.3.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.4.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.4.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.4.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.5.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.5.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.5.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.6.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.6.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.6.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.7.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.7.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.7.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.gate.weight | 1024 | | |
| | encoder.layers.3.feed_forward.norm.weight | 128 | | |
| | encoder.layers.3.feed_forward.norm.bias | 128 | | |
| | encoder.layers.3.norm.weight | 128 | | |
| | encoder.layers.3.norm.bias | 128 | | |
| | encoder.positional_encoder.norm.weight | 128 | | |
| | encoder.positional_encoder.norm.bias | 128 | | |
| | out.0.weight | 16384 | | |
| | out.0.bias | 128 | | |
| | out.1.weight | 128 | | |
| | out.1.bias | 128 | | |
| | out.2.weight | 1792 | | |
| | out.2.bias | 14 | | |
| +------------------------------------------------------------+------------+ | |
| Total Trainable Params: 6881810 | |
| | gcn.conv_layers.1.mask | 1452 | | |
| | gcn.conv_layers.1.conv_list.0.weight | 2048 | | |
| | gcn.conv_layers.1.conv_list.0.bias | 64 | | |
| | gcn.conv_layers.1.conv_list.1.weight | 2048 | | |
| | gcn.conv_layers.1.conv_list.1.bias | 64 | | |
| | gcn.conv_layers.1.conv_list.2.weight | 2048 | | |
| | gcn.conv_layers.1.conv_list.2.bias | 64 | | |
| | gcn.conv_layers.1.bn.weight | 64 | | |
| | gcn.conv_layers.1.bn.bias | 64 | | |
| | gcn.conv_layers.2.mask | 1452 | | |
| | gcn.conv_layers.2.conv_list.0.weight | 8192 | | |
| | gcn.conv_layers.2.conv_list.0.bias | 128 | | |
| | gcn.conv_layers.2.conv_list.1.weight | 8192 | | |
| | gcn.conv_layers.2.conv_list.1.bias | 128 | | |
| | gcn.conv_layers.2.conv_list.2.weight | 8192 | | |
| | gcn.conv_layers.2.conv_list.2.bias | 128 | | |
| | gcn.conv_layers.2.bn.weight | 128 | | |
| | gcn.conv_layers.2.bn.bias | 128 | | |
| | encoder.layers.0.attention.sublayer.heads.0.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.0.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.0.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.0.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.0.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.0.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.1.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.1.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.1.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.1.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.1.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.1.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.2.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.2.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.2.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.2.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.2.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.2.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.3.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.3.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.3.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.3.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.3.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.3.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.4.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.4.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.4.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.4.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.4.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.4.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.5.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.5.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.5.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.5.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.5.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.5.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.6.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.6.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.6.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.6.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.6.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.6.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.7.q_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.7.q_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.7.k_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.7.k_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.heads.7.v_conv.weight | 4096 | | |
| | encoder.layers.0.attention.sublayer.heads.7.v_conv.bias | 32 | | |
| | encoder.layers.0.attention.sublayer.linear.weight | 32768 | | |
| | encoder.layers.0.attention.sublayer.linear.bias | 128 | | |
| | encoder.layers.0.attention.norm.weight | 128 | | |
| | encoder.layers.0.attention.norm.bias | 128 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.0.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.0.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.0.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.1.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.1.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.1.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.2.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.2.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.2.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.3.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.3.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.3.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.4.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.4.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.4.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.5.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.5.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.5.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.6.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.6.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.6.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.7.w1.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.7.w2.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.experts.7.w3.weight | 65536 | | |
| | encoder.layers.0.feed_forward.sublayer.gate.weight | 1024 | | |
| | encoder.layers.0.feed_forward.norm.weight | 128 | | |
| | encoder.layers.0.feed_forward.norm.bias | 128 | | |
| | encoder.layers.0.norm.weight | 128 | | |
| | encoder.layers.0.norm.bias | 128 | | |
| | encoder.layers.1.attention.sublayer.heads.0.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.0.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.0.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.0.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.0.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.0.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.1.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.1.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.1.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.1.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.1.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.1.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.2.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.2.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.2.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.2.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.2.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.2.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.3.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.3.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.3.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.3.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.3.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.3.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.4.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.4.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.4.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.4.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.4.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.4.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.5.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.5.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.5.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.5.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.5.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.5.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.6.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.6.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.6.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.6.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.6.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.6.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.7.q_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.7.q_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.7.k_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.7.k_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.heads.7.v_conv.weight | 4096 | | |
| | encoder.layers.1.attention.sublayer.heads.7.v_conv.bias | 32 | | |
| | encoder.layers.1.attention.sublayer.linear.weight | 32768 | | |
| | encoder.layers.1.attention.sublayer.linear.bias | 128 | | |
| | encoder.layers.1.attention.norm.weight | 128 | | |
| | encoder.layers.1.attention.norm.bias | 128 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.0.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.0.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.0.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.1.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.1.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.1.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.2.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.2.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.2.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.3.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.3.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.3.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.4.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.4.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.4.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.5.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.5.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.5.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.6.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.6.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.6.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.7.w1.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.7.w2.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.experts.7.w3.weight | 65536 | | |
| | encoder.layers.1.feed_forward.sublayer.gate.weight | 1024 | | |
| | encoder.layers.1.feed_forward.norm.weight | 128 | | |
| | encoder.layers.1.feed_forward.norm.bias | 128 | | |
| | encoder.layers.1.norm.weight | 128 | | |
| | encoder.layers.1.norm.bias | 128 | | |
| | encoder.layers.2.attention.sublayer.heads.0.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.0.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.0.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.0.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.0.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.0.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.1.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.1.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.1.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.1.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.1.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.1.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.2.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.2.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.2.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.2.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.2.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.2.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.3.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.3.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.3.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.3.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.3.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.3.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.4.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.4.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.4.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.4.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.4.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.4.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.5.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.5.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.5.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.5.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.5.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.5.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.6.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.6.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.6.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.6.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.6.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.6.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.7.q_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.7.q_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.7.k_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.7.k_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.heads.7.v_conv.weight | 4096 | | |
| | encoder.layers.2.attention.sublayer.heads.7.v_conv.bias | 32 | | |
| | encoder.layers.2.attention.sublayer.linear.weight | 32768 | | |
| | encoder.layers.2.attention.sublayer.linear.bias | 128 | | |
| | encoder.layers.2.attention.norm.weight | 128 | | |
| | encoder.layers.2.attention.norm.bias | 128 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.0.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.0.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.0.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.1.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.1.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.1.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.2.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.2.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.2.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.3.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.3.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.3.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.4.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.4.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.4.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.5.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.5.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.5.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.6.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.6.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.6.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.7.w1.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.7.w2.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.experts.7.w3.weight | 65536 | | |
| | encoder.layers.2.feed_forward.sublayer.gate.weight | 1024 | | |
| | encoder.layers.2.feed_forward.norm.weight | 128 | | |
| | encoder.layers.2.feed_forward.norm.bias | 128 | | |
| | encoder.layers.2.norm.weight | 128 | | |
| | encoder.layers.2.norm.bias | 128 | | |
| | encoder.layers.3.attention.sublayer.heads.0.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.0.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.0.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.0.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.0.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.0.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.1.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.1.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.1.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.1.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.1.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.1.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.2.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.2.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.2.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.2.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.2.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.2.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.3.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.3.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.3.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.3.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.3.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.3.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.4.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.4.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.4.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.4.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.4.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.4.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.5.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.5.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.5.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.5.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.5.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.5.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.6.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.6.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.6.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.6.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.6.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.6.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.7.q_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.7.q_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.7.k_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.7.k_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.heads.7.v_conv.weight | 4096 | | |
| | encoder.layers.3.attention.sublayer.heads.7.v_conv.bias | 32 | | |
| | encoder.layers.3.attention.sublayer.linear.weight | 32768 | | |
| | encoder.layers.3.attention.sublayer.linear.bias | 128 | | |
| | encoder.layers.3.attention.norm.weight | 128 | | |
| | encoder.layers.3.attention.norm.bias | 128 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.0.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.0.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.0.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.1.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.1.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.1.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.2.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.2.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.2.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.3.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.3.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.3.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.4.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.4.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.4.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.5.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.5.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.5.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.6.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.6.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.6.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.7.w1.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.7.w2.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.experts.7.w3.weight | 65536 | | |
| | encoder.layers.3.feed_forward.sublayer.gate.weight | 1024 | | |
| | encoder.layers.3.feed_forward.norm.weight | 128 | | |
| | encoder.layers.3.feed_forward.norm.bias | 128 | | |
| | encoder.layers.3.norm.weight | 128 | | |
| | encoder.layers.3.norm.bias | 128 | | |
| | encoder.positional_encoder.norm.weight | 128 | | |
| | encoder.positional_encoder.norm.bias | 128 | | |
| | out.0.weight | 16384 | | |
| | out.0.bias | 128 | | |
| | out.1.weight | 128 | | |
| | out.1.bias | 128 | | |
| | out.2.weight | 1792 | | |
| | out.2.bias | 14 | | |
| +------------------------------------------------------------+------------+ | |
| Total Trainable Params: 6881810 | |
| FLOPs of the MoE_GCN model using OpenAI_flops : = | |
| 2083328 FLOPs | |
| FLOPs of the MoE_GCN model using DeepMind : = | |
| 20748288 FLOPs | |
| Collecting torchstat | |
| Downloading torchstat-0.0.7-py3-none-any.whl (11 kB) | |
| Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from torchstat) (2.1.0+cu121) | |
| Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from torchstat) (1.23.5) | |
| Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from torchstat) (1.5.3) | |
| Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->torchstat) (2.8.2) | |
| Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->torchstat) (2023.3.post1) | |
| Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (3.13.1) | |
| Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (4.5.0) | |
| Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (1.12) | |
| Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (3.2.1) | |
| Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (3.1.3) | |
| Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (2023.6.0) | |
| Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->torchstat) (2.1.0) | |
| Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->torchstat) (1.16.0) | |
| Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->torchstat) (2.1.4) | |
| Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->torchstat) (1.3.0) | |
| +------------------------------------------------------------+------------+ | |
| | Modules | Parameters | | |
| +------------------------------------------------------------+------------+ | |
| | gcn.conv_layers.0.mask | 1452 | | |
| | gcn.conv_layers.0.conv_list.0.weight | 96 | | |
| | gcn.conv_layers.0.conv_list.0.bias | 32 | | |
| | gcn.conv_layers.0.conv_list.1.weight | 96 | | |
| | gcn.conv_layers.0.conv_list.1.bias | 32 | | |
| | gcn.conv_layers.0.conv_list.2.weight | 96 | | |
| | gcn.conv_layers.0.conv_list.2.bias | 32 | | |
| | gcn.conv_layers.0.bn.weight | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| | gcn.conv_layers.0.bn.bias | 32 | | |
| [ 0.4739, -0.4411, 0.5949],.bias | 32 | | |
| ..., | |
| [ 0.4923, -0.3621, 0.5645], | |
| [ 0.5081, -0.3883, 0.5798], | |
| [ 0.5182, -0.3990, 0.5934]]], | |
| [[[ 0.4553, -0.4093, 0.5347], | |
| [ 0.4465, -0.3465, 0.5095], | |
| [ 0.4286, -0.3852, 0.5241], | |
| ..., | |
| [ 0.4509, -0.3077, 0.4728], | |
| [ 0.4479, -0.3254, 0.4838], | |
| [ 0.4530, -0.3408, 0.4978]], | |
| [[ 0.4037, -0.3051, 0.4236], | |
| [ 0.3883, -0.2474, 0.4012], | |
| [ 0.3734, -0.2841, 0.4104], | |
| ..., | |
| [ 0.4092, -0.2127, 0.3748], | |
| [ 0.4185, -0.2356, 0.3855], | |
| [ 0.4252, -0.2515, 0.3991]], | |
| [[ 0.3537, -0.2618, 0.3555], | |
| [ 0.3258, -0.2090, 0.3282], | |
| [ 0.3217, -0.2452, 0.3408], | |
| ..., | |
| [ 0.3415, -0.1797, 0.2958], | |
| [ 0.3506, -0.2016, 0.3046], | |
| [ 0.3611, -0.2148, 0.3186]], | |
| ..., | |
| [[ 0.4537, -0.3549, 0.5033], | |
| [ 0.4227, -0.3063, 0.4904], | |
| [ 0.4194, -0.3419, 0.4946], | |
| ..., | |
| [ 0.4334, -0.2780, 0.4736], | |
| [ 0.4372, -0.3001, 0.4873], | |
| [ 0.4425, -0.3181, 0.4995]], | |
| [[ 0.4640, -0.3862, 0.5401], | |
| [ 0.4427, -0.3413, 0.5315], | |
| [ 0.4335, -0.3730, 0.5314], | |
| ..., | |
| [ 0.4738, -0.3113, 0.5228], | |
| [ 0.4929, -0.3358, 0.5369], | |
| [ 0.5051, -0.3468, 0.5482]], | |
| [[ 0.4655, -0.4041, 0.5552], | |
| [ 0.4422, -0.3530, 0.5404], | |
| [ 0.4380, -0.3914, 0.5487], | |
| ..., | |
| [ 0.4567, -0.3171, 0.5157], | |
| [ 0.4776, -0.3411, 0.5297], | |
| [ 0.4967, -0.3589, 0.5422]]], | |
| ..., | |
| [[[ 0.4761, -0.3570, 0.5141], | |
| [ 0.4648, -0.3141, 0.5161], | |
| [ 0.4576, -0.3426, 0.5166], | |
| ..., | |
| [ 0.5142, -0.2463, 0.5095], | |
| [ 0.5202, -0.2300, 0.5076], | |
| [ 0.5223, -0.2150, 0.5060]], | |
| [[ 0.3927, -0.2960, 0.4408], | |
| [ 0.3717, -0.2445, 0.4256], | |
| [ 0.3685, -0.2817, 0.4376], | |
| ..., | |
| [ 0.3918, -0.1666, 0.3913], | |
| [ 0.3900, -0.1506, 0.3827], | |
| [ 0.3875, -0.1388, 0.3751]], | |
| [[ 0.3311, -0.2876, 0.3770], | |
| [ 0.3134, -0.2340, 0.3532], | |
| [ 0.3020, -0.2676, 0.3640], | |
| ..., | |
| [ 0.3366, -0.1689, 0.3178], | |
| [ 0.3309, -0.1637, 0.3046], | |
| [ 0.3255, -0.1634, 0.2930]], | |
| ..., | |
| [[ 0.3970, -0.3313, 0.4459], | |
| [ 0.3739, -0.2776, 0.4274], | |
| [ 0.3668, -0.3141, 0.4362], | |
| ..., | |
| [ 0.3846, -0.2333, 0.3969], | |
| [ 0.3707, -0.2444, 0.3857], | |
| [ 0.3743, -0.2543, 0.3905]], | |
| [[ 0.4111, -0.3530, 0.4816], | |
| [ 0.3957, -0.3066, 0.4776], | |
| [ 0.3829, -0.3379, 0.4756], | |
| ..., | |
| [ 0.4256, -0.2702, 0.4716], | |
| [ 0.4197, -0.2851, 0.4655], | |
| [ 0.4262, -0.3008, 0.4746]], | |
| [[ 0.4676, -0.4057, 0.5600], | |
| [ 0.4560, -0.3730, 0.5605], | |
| [ 0.4467, -0.3971, 0.5617], | |
| ..., | |
| [ 0.4902, -0.2986, 0.5512], | |
| [ 0.4911, -0.2836, 0.5483], | |
| [ 0.5034, -0.2781, 0.5599]]], | |
| [[[ 0.4721, -0.4069, 0.5961], | |
| [ 0.4707, -0.3673, 0.6005], | |
| [ 0.4602, -0.3962, 0.6026], | |
| ..., | |
| [ 0.4986, -0.2846, 0.5858], | |
| [ 0.5057, -0.2652, 0.5839], | |
| [ 0.5095, -0.2495, 0.5822]], | |
| [[ 0.4048, -0.3119, 0.4769], | |
| [ 0.3951, -0.2639, 0.4670], | |
| [ 0.3832, -0.2988, 0.4758], | |
| ..., | |
| [ 0.4350, -0.1855, 0.4479], | |
| [ 0.4395, -0.1692, 0.4425], | |
| [ 0.4435, -0.1551, 0.4377]], | |
| [[ 0.3602, -0.2710, 0.3874], | |
| [ 0.3429, -0.2231, 0.3582], | |
| [ 0.3350, -0.2542, 0.3804], | |
| ..., | |
| [ 0.3495, -0.1652, 0.3141], | |
| [ 0.3383, -0.1565, 0.3000], | |
| [ 0.3286, -0.1488, 0.2876]], | |
| ..., | |
| [[ 0.4653, -0.3848, 0.5433], | |
| [ 0.4530, -0.3342, 0.5315], | |
| [ 0.4418, -0.3660, 0.5397], | |
| ..., | |
| [ 0.4616, -0.3031, 0.5096], | |
| [ 0.4674, -0.3284, 0.5250], | |
| [ 0.4782, -0.3437, 0.5373]], | |
| [[ 0.4766, -0.4024, 0.5700], | |
| [ 0.4672, -0.3583, 0.5674], | |
| [ 0.4549, -0.3893, 0.5701], | |
| ..., | |
| [ 0.4827, -0.3129, 0.5544], | |
| [ 0.4868, -0.3336, 0.5710], | |
| [ 0.4939, -0.3441, 0.5810]], | |
| [[ 0.4893, -0.4281, 0.5998], | |
| [ 0.4710, -0.3846, 0.5879], | |
| [ 0.4665, -0.4158, 0.5996], | |
| ..., | |
| [ 0.4830, -0.3421, 0.5596], | |
| [ 0.4835, -0.3640, 0.5749], | |
| [ 0.4894, -0.3746, 0.5868]]], | |
| [[[ 0.4661, -0.4360, 0.6053], | |
| [ 0.4644, -0.3942, 0.6059], | |
| [ 0.4469, -0.4223, 0.6035], | |
| ..., | |
| [ 0.5182, -0.3283, 0.6059], | |
| [ 0.5178, -0.3333, 0.6037], | |
| [ 0.5292, -0.3455, 0.6152]], | |
| [[ 0.4357, -0.3562, 0.5250], | |
| [ 0.4243, -0.3055, 0.5140], | |
| [ 0.4129, -0.3416, 0.5214], | |
| ..., | |
| [ 0.4613, -0.2225, 0.4948], | |
| [ 0.4654, -0.2058, 0.4900], | |
| [ 0.4691, -0.1910, 0.4858]], | |
| [[ 0.3920, -0.3078, 0.4274], | |
| [ 0.3592, -0.2533, 0.4066], | |
| [ 0.3542, -0.2891, 0.4083], | |
| ..., | |
| [ 0.3804, -0.2067, 0.3905], | |
| [ 0.3754, -0.2042, 0.3783], | |
| [ 0.3706, -0.2030, 0.3676]], | |
| ..., | |
| [[ 0.4513, -0.4002, 0.5616], | |
| [ 0.4358, -0.3517, 0.5431], | |
| [ 0.4258, -0.3837, 0.5538], | |
| ..., | |
| [ 0.4320, -0.3158, 0.5094], | |
| [ 0.4378, -0.3351, 0.5229], | |
| [ 0.4490, -0.3474, 0.5371]], | |
| [[ 0.4701, -0.4148, 0.5681], | |
| [ 0.4479, -0.3735, 0.5515], | |
| [ 0.4469, -0.4041, 0.5673], | |
| ..., | |
| [ 0.4323, -0.2901, 0.5100], | |
| [ 0.4419, -0.3081, 0.5235], | |
| [ 0.4596, -0.3278, 0.5353]], | |
| [[ 0.4701, -0.4338, 0.5783], | |
| [ 0.4548, -0.3844, 0.5665], | |
| [ 0.4445, -0.4182, 0.5731], | |
| ..., | |
| [ 0.4607, -0.3052, 0.5426], | |
| [ 0.4555, -0.3033, 0.5344], | |
| [ 0.4608, -0.3103, 0.5418]]]]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | |
| 0, 0, 0, 0, 0, 0, 0, 0])]) | |
| 2 | |
| dict_keys(['skeleton', 'label']) | |
| tensor([[[[ 0.4939, -0.4103, 0.5915], | |
| [ 0.4737, -0.3553, 0.5707], | |
| [ 0.4599, -0.3896, 0.5775], | |
| ..., | |
| [ 0.4801, -0.2952, 0.5397], | |
| [ 0.4706, -0.3054, 0.5279], | |
| [ 0.4703, -0.3167, 0.5310]], | |
| [[ 0.4864, -0.3711, 0.5602], | |
| [ 0.4609, -0.3071, 0.5350], | |
| [ 0.4513, -0.3446, 0.5424], | |
| ..., | |
| [ 0.4731, -0.2591, 0.5014], | |
| [ 0.4591, -0.2642, 0.4880], | |
| [ 0.4564, -0.2782, 0.4883]], | |
| [[ 0.4212, -0.2986, 0.4492], | |
| [ 0.3987, -0.2458, 0.4205], | |
| [ 0.3858, -0.2760, 0.4282], | |
| ..., | |
| [ 0.4163, -0.2065, 0.3856], | |
| [ 0.3996, -0.2124, 0.3711], | |
| [ 0.3905, -0.2104, 0.3684]], | |
| ..., | |
| [[ 0.4576, -0.3808, 0.5104], | |
| [ 0.4329, -0.3245, 0.4925], | |
| [ 0.4201, -0.3548, 0.4903], | |
| ..., | |
| [ 0.4615, -0.2862, 0.4808], | |
| [ 0.4557, -0.3022, 0.4882], | |
| [ 0.4561, -0.3109, 0.4997]], | |
| [[ 0.4845, -0.4035, 0.5531], | |
| [ 0.4633, -0.3530, 0.5352], | |
| [ 0.4509, -0.3827, 0.5375], | |
| ..., | |
| [ 0.4786, -0.3131, 0.5140], | |
| [ 0.4782, -0.3320, 0.5252], | |
| [ 0.4785, -0.3434, 0.5350]], | |
| [[ 0.5020, -0.4166, 0.5738], | |
| [ 0.4699, -0.3629, 0.5480], | |
| [ 0.4667, -0.3985, 0.5586], | |
| ..., | |
| [ 0.4755, -0.3057, 0.5077], | |
| [ 0.4767, -0.3267, 0.5179], | |
| [ 0.4811, -0.3479, 0.5321]]], | |
| [[[ 0.4939, -0.4563, 0.5789], | |
| [ 0.4638, -0.3953, 0.5506], | |
| [ 0.4620, -0.4382, 0.5657], | |
| ..., | |
| [ 0.4632, -0.3589, 0.5063], | |
| [ 0.4661, -0.3812, 0.5163], | |
| [ 0.4745, -0.4018, 0.5311]], | |
| [[ 0.4456, -0.3599, 0.5136], | |
| [ 0.4279, -0.2952, 0.4838], | |
| [ 0.4181, -0.3321, 0.5008], | |
| ..., | |
| [ 0.4411, -0.2501, 0.4356], | |
| [ 0.4369, -0.2676, 0.4452], | |
| [ 0.4389, -0.2807, 0.4601]], | |
| [[ 0.3839, -0.2790, 0.4051], | |
| [ 0.3551, -0.2230, 0.3712], | |
| [ 0.3479, -0.2560, 0.3870], | |
| ..., | |
| [ 0.3680, -0.1768, 0.3253], | |
| [ 0.3622, -0.1935, 0.3308], | |
| [ 0.3616, -0.2029, 0.3452]], | |
| ..., | |
| [[ 0.4551, -0.3857, 0.5213], | |
| [ 0.4337, -0.3419, 0.5154], | |
| [ 0.4222, -0.3709, 0.5111], | |
| ..., | |
| [ 0.4649, -0.3003, 0.5134], | |
| [ 0.4720, -0.3259, 0.5276], | |
| [ 0.4770, -0.3510, 0.5386]], | |
| [[ 0.4612, -0.4040, 0.5458], | |
| [ 0.4535, -0.3696, 0.5527], | |
| [ 0.4339, -0.3905, 0.5403], | |
| ..., | |
| [ 0.5015, -0.3442, 0.5705], | |
| [ 0.5089, -0.3708, 0.5856], | |
| [ 0.5125, -0.3900, 0.5944]], | |
| [[ 0.5000, -0.4536, 0.5998], | |
| [ 0.4767, -0.4069, 0.5885], | |
| [ 0.4739, -0.4411, 0.5949], | |
| ..., | |
| [ 0.4923, -0.3621, 0.5645], | |
| [ 0.5081, -0.3883, 0.5798], | |
| [ 0.5182, -0.3990, 0.5934]]], | |
| [[[ 0.4553, -0.4093, 0.5347], | |
| [ 0.4465, -0.3465, 0.5095], | |
| [ 0.4286, -0.3852, 0.5241], | |
| ..., | |
| [ 0.4509, -0.3077, 0.4728], | |
| [ 0.4479, -0.3254, 0.4838], | |
| [ 0.4530, -0.3408, 0.4978]], | |
| [[ 0.4037, -0.3051, 0.4236], | |
| [ 0.3883, -0.2474, 0.4012], | |
| [ 0.3734, -0.2841, 0.4104], | |
| ..., | |
| [ 0.4092, -0.2127, 0.3748], | |
| [ 0.4185, -0.2356, 0.3855], | |
| [ 0.4252, -0.2515, 0.3991]], | |
| [[ 0.3537, -0.2618, 0.3555], | |
| [ 0.3258, -0.2090, 0.3282], | |
| [ 0.3217, -0.2452, 0.3408], | |
| ..., | |
| [ 0.3415, -0.1797, 0.2958], | |
| [ 0.3506, -0.2016, 0.3046], | |
| [ 0.3611, -0.2148, 0.3186]], | |
| ..., | |
| [[ 0.4537, -0.3549, 0.5033], | |
| [ 0.4227, -0.3063, 0.4904], | |
| [ 0.4194, -0.3419, 0.4946], | |
| ..., | |
| [ 0.4334, -0.2780, 0.4736], | |
| [ 0.4372, -0.3001, 0.4873], | |
| [ 0.4425, -0.3181, 0.4995]], | |
| [[ 0.4640, -0.3862, 0.5401], | |
| [ 0.4427, -0.3413, 0.5315], | |
| [ 0.4335, -0.3730, 0.5314], | |
| ..., | |
| [ 0.4738, -0.3113, 0.5228], | |
| [ 0.4929, -0.3358, 0.5369], | |
| [ 0.5051, -0.3468, 0.5482]], | |
| [[ 0.4655, -0.4041, 0.5552], | |
| [ 0.4422, -0.3530, 0.5404], | |
| [ 0.4380, -0.3914, 0.5487], | |
| ..., | |
| [ 0.4567, -0.3171, 0.5157], | |
| [ 0.4776, -0.3411, 0.5297], | |
| [ 0.4967, -0.3589, 0.5422]]], | |
| ..., | |
| [[[ 0.4761, -0.3570, 0.5141], | |
| [ 0.4648, -0.3141, 0.5161], | |
| [ 0.4576, -0.3426, 0.5166], | |
| ..., | |
| [ 0.5142, -0.2463, 0.5095], | |
| [ 0.5202, -0.2300, 0.5076], | |
| [ 0.5223, -0.2150, 0.5060]], | |
| [[ 0.3927, -0.2960, 0.4408], | |
| [ 0.3717, -0.2445, 0.4256], | |
| [ 0.3685, -0.2817, 0.4376], | |
| ..., | |
| [ 0.3918, -0.1666, 0.3913], | |
| [ 0.3900, -0.1506, 0.3827], | |
| [ 0.3875, -0.1388, 0.3751]], | |
| [[ 0.3311, -0.2876, 0.3770], | |
| [ 0.3134, -0.2340, 0.3532], | |
| [ 0.3020, -0.2676, 0.3640], | |
| ..., | |
| [ 0.3366, -0.1689, 0.3178], | |
| [ 0.3309, -0.1637, 0.3046], | |
| [ 0.3255, -0.1634, 0.2930]], | |
| ..., | |
| [[ 0.3970, -0.3313, 0.4459], | |
| [ 0.3739, -0.2776, 0.4274], | |
| [ 0.3668, -0.3141, 0.4362], | |
| ..., | |
| [ 0.3846, -0.2333, 0.3969], | |
| [ 0.3707, -0.2444, 0.3857], | |
| [ 0.3743, -0.2543, 0.3905]], | |
| [[ 0.4111, -0.3530, 0.4816], | |
| [ 0.3957, -0.3066, 0.4776], | |
| [ 0.3829, -0.3379, 0.4756], | |
| ..., | |
| [ 0.4256, -0.2702, 0.4716], | |
| [ 0.4197, -0.2851, 0.4655], | |
| [ 0.4262, -0.3008, 0.4746]], | |
| [[ 0.4676, -0.4057, 0.5600], | |
| [ 0.4560, -0.3730, 0.5605], | |
| [ 0.4467, -0.3971, 0.5617], | |
| ..., | |
| [ 0.4902, -0.2986, 0.5512], | |
| [ 0.4911, -0.2836, 0.5483], | |
| [ 0.5034, -0.2781, 0.5599]]], | |
| [[[ 0.4721, -0.4069, 0.5961], | |
| [ 0.4707, -0.3673, 0.6005], | |
| [ 0.4602, -0.3962, 0.6026], | |
| ..., | |
| [ 0.4986, -0.2846, 0.5858], | |
| [ 0.5057, -0.2652, 0.5839], | |
| [ 0.5095, -0.2495, 0.5822]], | |
| [[ 0.4048, -0.3119, 0.4769], | |
| [ 0.3951, -0.2639, 0.4670], | |
| [ 0.3832, -0.2988, 0.4758], | |
| ..., | |
| [ 0.4350, -0.1855, 0.4479], | |
| [ 0.4395, -0.1692, 0.4425], | |
| [ 0.4435, -0.1551, 0.4377]], | |
| [[ 0.3602, -0.2710, 0.3874], | |
| [ 0.3429, -0.2231, 0.3582], | |
| [ 0.3350, -0.2542, 0.3804], | |
| ..., | |
| [ 0.3495, -0.1652, 0.3141], | |
| [ 0.3383, -0.1565, 0.3000], | |
| [ 0.3286, -0.1488, 0.2876]], | |
| ..., | |
| [[ 0.4653, -0.3848, 0.5433], | |
| [ 0.4530, -0.3342, 0.5315], | |
| [ 0.4418, -0.3660, 0.5397], | |
| ..., | |
| [ 0.4616, -0.3031, 0.5096], | |
| [ 0.4674, -0.3284, 0.5250], | |
| [ 0.4782, -0.3437, 0.5373]], | |
| [[ 0.4766, -0.4024, 0.5700], | |
| [ 0.4672, -0.3583, 0.5674], | |
| [ 0.4549, -0.3893, 0.5701], | |
| ..., | |
| [ 0.4827, -0.3129, 0.5544], | |
| [ 0.4868, -0.3336, 0.5710], | |
| [ 0.4939, -0.3441, 0.5810]], | |
| [[ 0.4893, -0.4281, 0.5998], | |
| [ 0.4710, -0.3846, 0.5879], | |
| [ 0.4665, -0.4158, 0.5996], | |
| ..., | |
| [ 0.4830, -0.3421, 0.5596], | |
| [ 0.4835, -0.3640, 0.5749], | |
| [ 0.4894, -0.3746, 0.5868]]], | |
| [[[ 0.4661, -0.4360, 0.6053], | |
| [ 0.4644, -0.3942, 0.6059], | |
| [ 0.4469, -0.4223, 0.6035], | |
| ..., | |
| [ 0.5182, -0.3283, 0.6059], | |
| [ 0.5178, -0.3333, 0.6037], | |
| [ 0.5292, -0.3455, 0.6152]], | |
| [[ 0.4357, -0.3562, 0.5250], | |
| [ 0.4243, -0.3055, 0.5140], | |
| [ 0.4129, -0.3416, 0.5214], | |
| ..., | |
| [ 0.4613, -0.2225, 0.4948], | |
| [ 0.4654, -0.2058, 0.4900], | |
| [ 0.4691, -0.1910, 0.4858]], | |
| [[ 0.3920, -0.3078, 0.4274], | |
| [ 0.3592, -0.2533, 0.4066], | |
| [ 0.3542, -0.2891, 0.4083], | |
| ..., | |
| [ 0.3804, -0.2067, 0.3905], | |
| [ 0.3754, -0.2042, 0.3783], | |
| [ 0.3706, -0.2030, 0.3676]], | |
| ..., | |
| [[ 0.4513, -0.4002, 0.5616], | |
| [ 0.4358, -0.3517, 0.5431], | |
| [ 0.4258, -0.3837, 0.5538], | |
| ..., | |
| [ 0.4320, -0.3158, 0.5094], | |
| [ 0.4378, -0.3351, 0.5229], | |
| [ 0.4490, -0.3474, 0.5371]], | |
| [[ 0.4701, -0.4148, 0.5681], | |
| [ 0.4479, -0.3735, 0.5515], | |
| [ 0.4469, -0.4041, 0.5673], | |
| ..., | |
| [ 0.4323, -0.2901, 0.5100], | |
| [ 0.4419, -0.3081, 0.5235], | |
| [ 0.4596, -0.3278, 0.5353]], | |
| [[ 0.4701, -0.4338, 0.5783], | |
| [ 0.4548, -0.3844, 0.5665], | |
| [ 0.4445, -0.4182, 0.5731], | |
| ..., | |
| [ 0.4607, -0.3052, 0.5426], | |
| [ 0.4555, -0.3033, 0.5344], | |
| [ 0.4608, -0.3103, 0.5418]]]]) | |
| skeleton | |
| label | |
| Tensor_dataT.size() = torch.Size([32, 8, 22, 3]) | |
| Tensor_dataT [[[ 0.45649411 -0.44376922 0.64408398] | |
| [ 0.45470198 -0.38724606 0.63845301] | |
| [ 0.43893905 -0.42612109 0.64874703] | |
| [ 0.41352988 -0.38762114 0.65398598] | |
| [ 0.38978973 -0.35307541 0.65376902] | |
| [ 0.38361855 -0.32759178 0.65515703] | |
| [ 0.4284792 -0.32936773 0.63953203] | |
| [ 0.42679544 -0.27440213 0.63265598] | |
| [ 0.42789929 -0.24478968 0.62921703] | |
| [ 0.42692376 -0.22101636 0.62609202] | |
| [ 0.45129005 -0.32722124 0.63175601] | |
| [ 0.44714241 -0.26338693 0.619479 ] | |
| [ 0.50353895 -0.34977931 0.62299418] | |
| [ 0.3156789 -0.27295206 0.60414994] | |
| [ 0.47331994 -0.33073168 0.62587798] | |
| [ 0.47041377 -0.27057905 0.61708701] | |
| [ 0.42616162 -0.25311683 0.71024507] | |
| [ 0.46730407 -0.21198767 0.60850102] | |
| [ 0.49502148 -0.34408698 0.619892 ] | |
| [ 0.49964735 -0.29375331 0.61635399] | |
| [ 0.50096575 -0.2707146 0.61472797] | |
| [ 0.45703796 -0.25597774 0.53799719]] | |
| [[ 0.47197036 -0.53988416 0.62220198] | |
| [ 0.4580784 -0.48336113 0.60461497] | |
| [ 0.44658563 -0.52154911 0.619412 ] | |
| [ 0.41508607 -0.48072571 0.61257303] | |
| [ 0.38082476 -0.44323452 0.589674 ] | |
| [ 0.36590875 -0.41295902 0.57904899] | |
| [ 0.41487425 -0.41403426 0.58618098] | |
| [ 0.4140427 -0.3613101 0.57492298] | |
| [ 0.41350103 -0.33505483 0.56929302] | |
| [ 0.4128616 -0.3113708 0.56417602] | |
| [ 0.44144987 -0.41669524 0.58362198] | |
| [ 0.43370049 -0.3579911 0.57625699] | |
| [ 0.49153033 -0.44347535 0.58263618] | |
| [ 0.30505913 -0.3713189 0.56583893] | |
| [ 0.46648639 -0.42415859 0.58316201] | |
| [ 0.45920816 -0.35960788 0.55863303] | |
| [ 0.41443498 -0.33983203 0.64373803] | |
| [ 0.44764333 -0.31478569 0.53467399] | |
| [ 0.49675979 -0.44231811 0.58563203] | |
| [ 0.48508369 -0.37770261 0.560574 ] | |
| [ 0.47726461 -0.3720665 0.549061 ] | |
| [ 0.41848824 -0.3831209 0.46360517]] | |
| [[ 0.48442138 -0.62001068 0.627913 ] | |
| [ 0.49168067 -0.57390347 0.60613197] | |
| [ 0.46249449 -0.58999824 0.617248 ] | |
| [ 0.46103121 -0.54664181 0.59524101] | |
| [ 0.47081903 -0.50676016 0.58063197] | |
| [ 0.47703809 -0.47600041 0.568829 ] | |
| [ 0.46432632 -0.4830178 0.574269 ] | |
| [ 0.45322754 -0.45161847 0.54255003] | |
| [ 0.44803504 -0.43584942 0.52669102] | |
| [ 0.46494869 -0.43861626 0.532435 ] | |
| [ 0.49320529 -0.50320813 0.58013397] | |
| [ 0.4726163 -0.49277866 0.54636502] | |
| [ 0.49780852 -0.61624818 0.53734219] | |
| [ 0.31374727 -0.59338886 0.52973294] | |
| [ 0.51733116 -0.52547568 0.58699799] | |
| [ 0.49490084 -0.51341313 0.55295902] | |
| [ 0.42256253 -0.5291977 0.63319808] | |
| [ 0.45177362 -0.54055221 0.54193801] | |
| [ 0.54645841 -0.56252827 0.598836 ] | |
| [ 0.51718837 -0.54657465 0.56708002] | |
| [ 0.50381804 -0.53947116 0.55249 ] | |
| [ 0.45658055 -0.56645892 0.4813922 ]] | |
| [[ 0.42587508 -0.67696014 0.56173801] | |
| [ 0.46481211 -0.66463394 0.56095499] | |
| [ 0.42960141 -0.64680746 0.55488998] | |
| [ 0.45645956 -0.617573 0.54098803] | |
| [ 0.48047119 -0.60097324 0.52756703] | |
| [ 0.49974634 -0.59675506 0.51798499] | |
| [ 0.49845378 -0.61679659 0.55171102] | |
| [ 0.51597109 -0.64597169 0.54589701] | |
| [ 0.5046869 -0.66281403 0.54299003] | |
| [ 0.5010224 -0.68822165 0.55855399] | |
| [ 0.5076827 -0.64872309 0.56002003] | |
| [ 0.52990845 -0.67580205 0.55221999] | |
| [ 0.58918724 -0.83106116 0.5905692 ] | |
| [ 0.40330692 -0.7874534 0.58834797] | |
| [ 0.5126011 -0.67765428 0.56693202] | |
| [ 0.52397628 -0.69367545 0.55389702] | |
| [ 0.4822849 -0.73224239 0.67502707] | |
| [ 0.51807494 -0.72636618 0.59556901] | |
| [ 0.51159101 -0.71302973 0.57522798] | |
| [ 0.51961776 -0.71390895 0.562195 ] | |
| [ 0.52242911 -0.72957377 0.580863 ] | |
| [ 0.4730504 -0.7378266 0.51739216]] | |
| [[ 0.28875737 -0.70936456 0.60423601] | |
| [ 0.31463972 -0.6756929 0.588386 ] | |
| [ 0.28418101 -0.67239651 0.58934498] | |
| [ 0.30820475 -0.66839428 0.57322299] | |
| [ 0.32074619 -0.62880076 0.561692 ] | |
| [ 0.34317953 -0.6224781 0.55596298] | |
| [ 0.33162814 -0.60459652 0.556265 ] | |
| [ 0.3104165 -0.60353749 0.52642602] | |
| [ 0.28594978 -0.60147305 0.51150697] | |
| [ 0.27226425 -0.61648274 0.51690602] | |
| [ 0.34866905 -0.63791372 0.56946599] | |
| [ 0.33020677 -0.64519455 0.54184502] | |
| [ 0.38262178 -0.79788928 0.57402021] | |
| [ 0.4739, -0.4411, 0.5949],.bias | 32 | | |
| WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). | |
| <ipython-input-186-f54b70cf0824>:8: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). | |
| Tensor_dataT = torch.tensor(dataT['skeleton']); | |
| <ipython-input-186-f54b70cf0824>:9: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). | |
| Tensor_labelsT = torch.tensor(dataT['label']); | |
| WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). | |
| WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). | |
| [ 0.4739, -0.4411, 0.5949],.bias | 32 | | |
| <ipython-input-187-dfd265fbff9e>:8: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). | |
| Tensor_dataT = torch.tensor(dataT['skeleton']); | |
| <ipython-input-187-dfd265fbff9e>:9: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). | |
| Tensor_labelsT = torch.tensor(dataT['label']); | |
| WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). | |
| [ 0.4739, -0.4411, 0.5949],.bias | 32 | | |
| [ 0.4739, -0.4411, 0.5949],.bias | 32 | | |
| [ 0.4739, -0.4411, 0.5949],.bias | 32 | |