NUM_GPUS=1 MASTER_ADDR=ip-10-0-135-126 MASTER_PORT=18935 WORLD_SIZE=1 PID of this process = 2164521 ------ ARGS ------- Namespace(model_suffix='beta', hcp_flat_path='/weka/proj-medarc/shared/HCP-Flat', batch_size=256, wandb_log=True, num_epochs=50, lr_scheduler_type='cycle', save_ckpt=False, seed=42, max_lr=1e-05, target='age', num_workers=15, weight_decay=1e-05) Input dimension: 737280 total_steps 21750 wandb_config: {'model_name': 'HCPflat_raw_age', 'batch_size': 256, 'weight_decay': 1e-05, 'num_epochs': 50, 'seed': 42, 'lr_scheduler_type': 'cycle', 'save_ckpt': False, 'max_lr': 1e-05, 'target': 'age', 'num_workers': 15} wandb_id: HCPflat_raw_beta_age_31e54b73-122f-4c96-8d20-21ee38d0705b Step [100/435] - Training Loss: 0.6079 - Training MSE: 0.5940 Step [200/435] - Training Loss: 0.5179 - Training MSE: 0.5895 Step [300/435] - Training Loss: 0.5806 - Training MSE: 0.5846 Step [400/435] - Training Loss: 0.4953 - Training MSE: 0.5817 Epoch [1/50] - Training Loss: 0.5784, Training MSE: 0.5813 - Validation Loss: 0.5355, Validation MSE: 0.5417 Step [100/435] - Training Loss: 0.4042 - Training MSE: 0.5476 Step [200/435] - Training Loss: 0.5508 - Training MSE: 0.5488 Step [300/435] - Training Loss: 0.4799 - Training MSE: 0.5478 Step [400/435] - Training Loss: 0.5536 - Training MSE: 0.5496 Epoch [2/50] - Training Loss: 0.4875, Training MSE: 0.5496 - Validation Loss: 0.5427, Validation MSE: 0.5495 Step [100/435] - Training Loss: 0.3078 - Training MSE: 0.5391 Step [200/435] - Training Loss: 0.3315 - Training MSE: 0.5413 Step [300/435] - Training Loss: 0.3964 - Training MSE: 0.5442 Step [400/435] - Training Loss: 0.3456 - Training MSE: 0.5453 Epoch [3/50] - Training Loss: 0.3329, Training MSE: 0.5454 - Validation Loss: 0.5490, Validation MSE: 0.5614 Step [100/435] - Training Loss: 0.2159 - Training MSE: 0.5618 Step [200/435] - Training Loss: 0.2387 - Training MSE: 0.5638 Step [300/435] - Training Loss: 0.2004 - Training MSE: 0.5652 Step [400/435] - Training Loss: 0.2158 - Training MSE: 0.5671 Epoch [4/50] - Training Loss: 0.2364, Training MSE: 0.5674 - Validation Loss: 0.5676, Validation MSE: 0.5808 Step [100/435] - Training Loss: 0.1533 - Training MSE: 0.5899 Step [200/435] - Training Loss: 0.1590 - Training MSE: 0.5937 Step [300/435] - Training Loss: 0.1987 - Training MSE: 0.5943 Step [400/435] - Training Loss: 0.2009 - Training MSE: 0.5958 Epoch [5/50] - Training Loss: 0.1795, Training MSE: 0.5955 - Validation Loss: 0.6034, Validation MSE: 0.6168 Step [100/435] - Training Loss: 0.1450 - Training MSE: 0.6214 Step [200/435] - Training Loss: 0.1467 - Training MSE: 0.6192 Step [300/435] - Training Loss: 0.1742 - Training MSE: 0.6189 Step [400/435] - Training Loss: 0.1820 - Training MSE: 0.6200 Epoch [6/50] - Training Loss: 0.1412, Training MSE: 0.6204 - Validation Loss: 0.6164, Validation MSE: 0.6309 Step [100/435] - Training Loss: 0.0961 - Training MSE: 0.6445 Step [200/435] - Training Loss: 0.1143 - Training MSE: 0.6474 Step [300/435] - Training Loss: 0.1104 - Training MSE: 0.6465 Step [400/435] - Training Loss: 0.1170 - Training MSE: 0.6448 Epoch [7/50] - Training Loss: 0.1132, Training MSE: 0.6442 - Validation Loss: 0.6420, Validation MSE: 0.6572 Step [100/435] - Training Loss: 0.0859 - Training MSE: 0.6744 Step [200/435] - Training Loss: 0.0967 - Training MSE: 0.6682 Step [300/435] - Training Loss: 0.0962 - Training MSE: 0.6668 Step [400/435] - Training Loss: 0.1131 - Training MSE: 0.6671 Epoch [8/50] - Training Loss: 0.0912, Training MSE: 0.6655 - Validation Loss: 0.6658, Validation MSE: 0.6790 Step [100/435] - Training Loss: 0.0650 - Training MSE: 0.6852 Step [200/435] - Training Loss: 0.0806 - Training MSE: 0.6835 Step [300/435] - Training Loss: 0.0789 - Training MSE: 0.6826 Step [400/435] - Training Loss: 0.0915 - Training MSE: 0.6824 Epoch [9/50] - Training Loss: 0.0739, Training MSE: 0.6820 - Validation Loss: 0.6844, Validation MSE: 0.7001 Step [100/435] - Training Loss: 0.0604 - Training MSE: 0.6952 Step [200/435] - Training Loss: 0.0608 - Training MSE: 0.6973 Step [300/435] - Training Loss: 0.0601 - Training MSE: 0.6990 Step [400/435] - Training Loss: 0.0888 - Training MSE: 0.6998 Epoch [10/50] - Training Loss: 0.0602, Training MSE: 0.6998 - Validation Loss: 0.7134, Validation MSE: 0.7278 Step [100/435] - Training Loss: 0.0403 - Training MSE: 0.7351 Step [200/435] - Training Loss: 0.0519 - Training MSE: 0.7232 Step [300/435] - Training Loss: 0.0490 - Training MSE: 0.7203 Step [400/435] - Training Loss: 0.0611 - Training MSE: 0.7156 Epoch [11/50] - Training Loss: 0.0492, Training MSE: 0.7141 - Validation Loss: 0.7246, Validation MSE: 0.7388 Step [100/435] - Training Loss: 0.0310 - Training MSE: 0.7381 Step [200/435] - Training Loss: 0.0368 - Training MSE: 0.7302 Step [300/435] - Training Loss: 0.0446 - Training MSE: 0.7282 Step [400/435] - Training Loss: 0.0474 - Training MSE: 0.7276 Epoch [12/50] - Training Loss: 0.0400, Training MSE: 0.7267 - Validation Loss: 0.7409, Validation MSE: 0.7569 Step [100/435] - Training Loss: 0.0315 - Training MSE: 0.7505 Step [200/435] - Training Loss: 0.0310 - Training MSE: 0.7421 Step [300/435] - Training Loss: 0.0356 - Training MSE: 0.7409 Step [400/435] - Training Loss: 0.0428 - Training MSE: 0.7382 Epoch [13/50] - Training Loss: 0.0324, Training MSE: 0.7377 - Validation Loss: 0.7593, Validation MSE: 0.7739 Step [100/435] - Training Loss: 0.0229 - Training MSE: 0.7523 Step [200/435] - Training Loss: 0.0279 - Training MSE: 0.7534 Step [300/435] - Training Loss: 0.0317 - Training MSE: 0.7516 Step [400/435] - Training Loss: 0.0314 - Training MSE: 0.7493 Epoch [14/50] - Training Loss: 0.0268, Training MSE: 0.7477 - Validation Loss: 0.7724, Validation MSE: 0.7878 Step [100/435] - Training Loss: 0.0163 - Training MSE: 0.7665 Step [200/435] - Training Loss: 0.0220 - Training MSE: 0.7628 Step [300/435] - Training Loss: 0.0242 - Training MSE: 0.7608 Step [400/435] - Training Loss: 0.0288 - Training MSE: 0.7579 Epoch [15/50] - Training Loss: 0.0215, Training MSE: 0.7572 - Validation Loss: 0.7895, Validation MSE: 0.8040 Step [100/435] - Training Loss: 0.0134 - Training MSE: 0.7748 Step [200/435] - Training Loss: 0.0160 - Training MSE: 0.7727 Step [300/435] - Training Loss: 0.0197 - Training MSE: 0.7660 Step [400/435] - Training Loss: 0.0230 - Training MSE: 0.7650 Epoch [16/50] - Training Loss: 0.0180, Training MSE: 0.7649 - Validation Loss: 0.8056, Validation MSE: 0.8198 Step [100/435] - Training Loss: 0.0127 - Training MSE: 0.7814 Step [200/435] - Training Loss: 0.0157 - Training MSE: 0.7747 Step [300/435] - Training Loss: 0.0165 - Training MSE: 0.7720 Step [400/435] - Training Loss: 0.0159 - Training MSE: 0.7728 Epoch [17/50] - Training Loss: 0.0145, Training MSE: 0.7717 - Validation Loss: 0.8114, Validation MSE: 0.8293 Step [100/435] - Training Loss: 0.0095 - Training MSE: 0.7832 Step [200/435] - Training Loss: 0.0129 - Training MSE: 0.7875 Step [300/435] - Training Loss: 0.0104 - Training MSE: 0.7807 Step [400/435] - Training Loss: 0.0119 - Training MSE: 0.7786 Epoch [18/50] - Training Loss: 0.0119, Training MSE: 0.7773 - Validation Loss: 0.8136, Validation MSE: 0.8289 Step [100/435] - Training Loss: 0.0080 - Training MSE: 0.7895 Step [200/435] - Training Loss: 0.0097 - Training MSE: 0.7878 Step [300/435] - Training Loss: 0.0100 - Training MSE: 0.7860 Step [400/435] - Training Loss: 0.0121 - Training MSE: 0.7824 Epoch [19/50] - Training Loss: 0.0100, Training MSE: 0.7826 - Validation Loss: 0.8201, Validation MSE: 0.8367 Step [100/435] - Training Loss: 0.0101 - Training MSE: 0.7886 Step [200/435] - Training Loss: 0.0097 - Training MSE: 0.7900 Step [300/435] - Training Loss: 0.0096 - Training MSE: 0.7883 Step [400/435] - Training Loss: 0.0120 - Training MSE: 0.7878 Epoch [20/50] - Training Loss: 0.0084, Training MSE: 0.7878 - Validation Loss: 0.8246, Validation MSE: 0.8405 Step [100/435] - Training Loss: 0.0061 - Training MSE: 0.7964 Step [200/435] - Training Loss: 0.0061 - Training MSE: 0.7913 Step [300/435] - Training Loss: 0.0081 - Training MSE: 0.7896 Step [400/435] - Training Loss: 0.0067 - Training MSE: 0.7924 Epoch [21/50] - Training Loss: 0.0071, Training MSE: 0.7912 - Validation Loss: 0.8344, Validation MSE: 0.8493 Step [100/435] - Training Loss: 0.0067 - Training MSE: 0.8002 Step [200/435] - Training Loss: 0.0071 - Training MSE: 0.8009 Step [300/435] - Training Loss: 0.0070 - Training MSE: 0.7963 Step [400/435] - Training Loss: 0.0053 - Training MSE: 0.7946 Epoch [22/50] - Training Loss: 0.0060, Training MSE: 0.7941 - Validation Loss: 0.8400, Validation MSE: 0.8555 Step [100/435] - Training Loss: 0.0046 - Training MSE: 0.7930 Step [200/435] - Training Loss: 0.0054 - Training MSE: 0.7987 Step [300/435] - Training Loss: 0.0048 - Training MSE: 0.7986 Step [400/435] - Training Loss: 0.0058 - Training MSE: 0.7976 Epoch [23/50] - Training Loss: 0.0051, Training MSE: 0.7968 - Validation Loss: 0.8398, Validation MSE: 0.8552 Step [100/435] - Training Loss: 0.0046 - Training MSE: 0.8091 Step [200/435] - Training Loss: 0.0044 - Training MSE: 0.8047 Step [300/435] - Training Loss: 0.0041 - Training MSE: 0.7978 Step [400/435] - Training Loss: 0.0041 - Training MSE: 0.7975 Epoch [24/50] - Training Loss: 0.0046, Training MSE: 0.7982 - Validation Loss: 0.8454, Validation MSE: 0.8614 Step [100/435] - Training Loss: 0.0043 - Training MSE: 0.8051 Step [200/435] - Training Loss: 0.0041 - Training MSE: 0.8029 Step [300/435] - Training Loss: 0.0036 - Training MSE: 0.8008 Step [400/435] - Training Loss: 0.0034 - Training MSE: 0.8006 Epoch [25/50] - Training Loss: 0.0040, Training MSE: 0.8000 - Validation Loss: 0.8471, Validation MSE: 0.8635 Step [100/435] - Training Loss: 0.0035 - Training MSE: 0.7997 Step [200/435] - Training Loss: 0.0028 - Training MSE: 0.8036 Step [300/435] - Training Loss: 0.0036 - Training MSE: 0.8002 Step [400/435] - Training Loss: 0.0042 - Training MSE: 0.8029 Epoch [26/50] - Training Loss: 0.0037, Training MSE: 0.8015 - Validation Loss: 0.8479, Validation MSE: 0.8643 Step [100/435] - Training Loss: 0.0035 - Training MSE: 0.8106 Step [200/435] - Training Loss: 0.0054 - Training MSE: 0.8063 Step [300/435] - Training Loss: 0.0046 - Training MSE: 0.8047 Step [400/435] - Training Loss: 0.0040 - Training MSE: 0.8028 Epoch [27/50] - Training Loss: 0.0043, Training MSE: 0.8031 - Validation Loss: 0.8483, Validation MSE: 0.8642 Step [100/435] - Training Loss: 0.0028 - Training MSE: 0.8015 Step [200/435] - Training Loss: 0.0040 - Training MSE: 0.8030 Step [300/435] - Training Loss: 0.0036 - Training MSE: 0.8030 Step [400/435] - Training Loss: 0.0079 - Training MSE: 0.8025 Epoch [28/50] - Training Loss: 0.0037, Training MSE: 0.8037 - Validation Loss: 0.8482, Validation MSE: 0.8644 Step [100/435] - Training Loss: 0.0133 - Training MSE: 0.8092 Step [200/435] - Training Loss: 0.0036 - Training MSE: 0.8067 Step [300/435] - Training Loss: 0.0033 - Training MSE: 0.8063 Step [400/435] - Training Loss: 0.0020 - Training MSE: 0.8067 Epoch [29/50] - Training Loss: 0.0044, Training MSE: 0.8054 - Validation Loss: 0.8503, Validation MSE: 0.8654 Step [100/435] - Training Loss: 0.0021 - Training MSE: 0.8117 Step [200/435] - Training Loss: 0.0121 - Training MSE: 0.8105 Step [300/435] - Training Loss: 0.0028 - Training MSE: 0.8060 Step [400/435] - Training Loss: 0.0025 - Training MSE: 0.8052 Epoch [30/50] - Training Loss: 0.0038, Training MSE: 0.8053 - Validation Loss: 0.8541, Validation MSE: 0.8688 Step [100/435] - Training Loss: 0.0014 - Training MSE: 0.7938 Step [200/435] - Training Loss: 0.0031 - Training MSE: 0.8074 Step [300/435] - Training Loss: 0.0015 - Training MSE: 0.8058 Step [400/435] - Training Loss: 0.0017 - Training MSE: 0.8041 Epoch [31/50] - Training Loss: 0.0025, Training MSE: 0.8045 - Validation Loss: 0.8527, Validation MSE: 0.8680 Step [100/435] - Training Loss: 0.0031 - Training MSE: 0.8159 Step [200/435] - Training Loss: 0.0014 - Training MSE: 0.8139 Step [300/435] - Training Loss: 0.0012 - Training MSE: 0.8103 Step [400/435] - Training Loss: 0.0011 - Training MSE: 0.8056 Epoch [32/50] - Training Loss: 0.0019, Training MSE: 0.8038 - Validation Loss: 0.8513, Validation MSE: 0.8665 Step [100/435] - Training Loss: 0.0102 - Training MSE: 0.8013 Step [200/435] - Training Loss: 0.0009 - Training MSE: 0.8001 Step [300/435] - Training Loss: 0.0006 - Training MSE: 0.8021 Step [400/435] - Training Loss: 0.0008 - Training MSE: 0.8021 Epoch [33/50] - Training Loss: 0.0008, Training MSE: 0.8031 - Validation Loss: 0.8515, Validation MSE: 0.8670 Step [100/435] - Training Loss: 0.0004 - Training MSE: 0.7997 Step [200/435] - Training Loss: 0.0004 - Training MSE: 0.8031 Step [300/435] - Training Loss: 0.0004 - Training MSE: 0.8047 Step [400/435] - Training Loss: 0.0005 - Training MSE: 0.8031 Epoch [34/50] - Training Loss: 0.0004, Training MSE: 0.8029 - Validation Loss: 0.8521, Validation MSE: 0.8676 Step [100/435] - Training Loss: 0.0002 - Training MSE: 0.7941 Step [200/435] - Training Loss: 0.0005 - Training MSE: 0.7996 Step [300/435] - Training Loss: 0.0003 - Training MSE: 0.8016 Step [400/435] - Training Loss: 0.0003 - Training MSE: 0.8022 Epoch [35/50] - Training Loss: 0.0003, Training MSE: 0.8028 - Validation Loss: 0.8521, Validation MSE: 0.8676 Step [100/435] - Training Loss: 0.0006 - Training MSE: 0.8030 Step [200/435] - Training Loss: 0.0002 - Training MSE: 0.8033 Step [300/435] - Training Loss: 0.0002 - Training MSE: 0.8046 Step [400/435] - Training Loss: 0.0002 - Training MSE: 0.8030 Epoch [36/50] - Training Loss: 0.0002, Training MSE: 0.8028 - Validation Loss: 0.8526, Validation MSE: 0.8683 Step [100/435] - Training Loss: 0.0001 - Training MSE: 0.8003 Step [200/435] - Training Loss: 0.0002 - Training MSE: 0.8045 Step [300/435] - Training Loss: 0.0002 - Training MSE: 0.8042 Step [400/435] - Training Loss: 0.0002 - Training MSE: 0.8047 Epoch [37/50] - Training Loss: 0.0002, Training MSE: 0.8033 - Validation Loss: 0.8518, Validation MSE: 0.8675 Step [100/435] - Training Loss: 0.0001 - Training MSE: 0.8073 Step [200/435] - Training Loss: 0.0002 - Training MSE: 0.8066 Step [300/435] - Training Loss: 0.0001 - Training MSE: 0.8049 Step [400/435] - Training Loss: 0.0001 - Training MSE: 0.8029 Epoch [38/50] - Training Loss: 0.0002, Training MSE: 0.8030 - Validation Loss: 0.8526, Validation MSE: 0.8680 Step [100/435] - Training Loss: 0.0001 - Training MSE: 0.8039 Step [200/435] - Training Loss: 0.0002 - Training MSE: 0.7996 Step [300/435] - Training Loss: 0.0001 - Training MSE: 0.8031 Step [400/435] - Training Loss: 0.0001 - Training MSE: 0.8036 Epoch [39/50] - Training Loss: 0.0001, Training MSE: 0.8035 - Validation Loss: 0.8525, Validation MSE: 0.8681 Step [100/435] - Training Loss: 0.0001 - Training MSE: 0.8010 Step [200/435] - Training Loss: 0.0001 - Training MSE: 0.8027 Step [300/435] - Training Loss: 0.0002 - Training MSE: 0.8015 Step [400/435] - Training Loss: 0.0001 - Training MSE: 0.8031 Epoch [40/50] - Training Loss: 0.0001, Training MSE: 0.8036 - Validation Loss: 0.8533, Validation MSE: 0.8687 Step [100/435] - Training Loss: 0.0001 - Training MSE: 0.8101 Step [200/435] - Training Loss: 0.0001 - Training MSE: 0.8026 Step [300/435] - Training Loss: 0.0001 - Training MSE: 0.8033 Step [400/435] - Training Loss: 0.0001 - Training MSE: 0.8038 Epoch [41/50] - Training Loss: 0.0001, Training MSE: 0.8033 - Validation Loss: 0.8533, Validation MSE: 0.8688 Step [100/435] - Training Loss: 0.0001 - Training MSE: 0.8005 Step [200/435] - Training Loss: 0.0001 - Training MSE: 0.8006 Step [300/435] - Training Loss: 0.0000 - Training MSE: 0.8055 Step [400/435] - Training Loss: 0.0000 - Training MSE: 0.8031 Epoch [42/50] - Training Loss: 0.0001, Training MSE: 0.8035 - Validation Loss: 0.8533, Validation MSE: 0.8689 Step [100/435] - Training Loss: 0.0000 - Training MSE: 0.7973 Step [200/435] - Training Loss: 0.0000 - Training MSE: 0.8005 Step [300/435] - Training Loss: 0.0000 - Training MSE: 0.8027 Step [400/435] - Training Loss: 0.0000 - Training MSE: 0.8034 Epoch [43/50] - Training Loss: 0.0000, Training MSE: 0.8033 - Validation Loss: 0.8532, Validation MSE: 0.8688 Step [100/435] - Training Loss: 0.0000 - Training MSE: 0.7967 Step [200/435] - Training Loss: 0.0000 - Training MSE: 0.7960 Step [300/435] - Training Loss: 0.0000 - Training MSE: 0.7976 Step [400/435] - Training Loss: 0.0000 - Training MSE: 0.8024 Epoch [44/50] - Training Loss: 0.0000, Training MSE: 0.8032 - Validation Loss: 0.8533, Validation MSE: 0.8688 Step [100/435] - Training Loss: 0.0000 - Training MSE: 0.8043 Step [200/435] - Training Loss: 0.0000 - Training MSE: 0.8054 Step [300/435] - Training Loss: 0.0000 - Training MSE: 0.8052 Step [400/435] - Training Loss: 0.0000 - Training MSE: 0.8047 Epoch [45/50] - Training Loss: 0.0000, Training MSE: 0.8037 - Validation Loss: 0.8533, Validation MSE: 0.8689 Step [100/435] - Training Loss: 0.0000 - Training MSE: 0.8019 Step [200/435] - Training Loss: 0.0000 - Training MSE: 0.8023 Step [300/435] - Training Loss: 0.0000 - Training MSE: 0.8026 Step [400/435] - Training Loss: 0.0000 - Training MSE: 0.8041 Epoch [46/50] - Training Loss: 0.0000, Training MSE: 0.8032 - Validation Loss: 0.8533, Validation MSE: 0.8689 Step [100/435] - Training Loss: 0.0000 - Training MSE: 0.7993 Step [200/435] - Training Loss: 0.0000 - Training MSE: 0.8029 Step [300/435] - Training Loss: 0.0000 - Training MSE: 0.8066 Step [400/435] - Training Loss: 0.0000 - Training MSE: 0.8051 Epoch [47/50] - Training Loss: 0.0000, Training MSE: 0.8036 - Validation Loss: 0.8533, Validation MSE: 0.8689 Step [100/435] - Training Loss: 0.0000 - Training MSE: 0.7992 Step [200/435] - Training Loss: 0.0000 - Training MSE: 0.8020 Step [300/435] - Training Loss: 0.0000 - Training MSE: 0.8030 Step [400/435] - Training Loss: 0.0000 - Training MSE: 0.8036 Epoch [48/50] - Training Loss: 0.0000, Training MSE: 0.8032 - Validation Loss: 0.8533, Validation MSE: 0.8689 Step [100/435] - Training Loss: 0.0000 - Training MSE: 0.7972 Step [200/435] - Training Loss: 0.0000 - Training MSE: 0.7993 Step [300/435] - Training Loss: 0.0000 - Training MSE: 0.8023 Step [400/435] - Training Loss: 0.0000 - Training MSE: 0.8016 Epoch [49/50] - Training Loss: 0.0000, Training MSE: 0.8035 - Validation Loss: 0.8533, Validation MSE: 0.8689 Step [100/435] - Training Loss: 0.0000 - Training MSE: 0.8021 Step [200/435] - Training Loss: 0.0000 - Training MSE: 0.8040 Step [300/435] - Training Loss: 0.0000 - Training MSE: 0.8023 Step [400/435] - Training Loss: 0.0000 - Training MSE: 0.8036 Epoch [50/50] - Training Loss: 0.0000, Training MSE: 0.8037 - Validation Loss: 0.8533, Validation MSE: 0.8689 wandb: 🚀 View run HCPflat_raw_beta_age at: https://stability.wandb.io/ckadirt/fMRI-foundation-model/runs/HCPflat_raw_beta_age_31e54b73-122f-4c96-8d20-21ee38d0705b wandb: Find logs at: wandb/run-20241127_021238-HCPflat_raw_beta_age_31e54b73-122f-4c96-8d20-21ee38d0705b/logs