File size: 5,900 Bytes
944cdc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
2024-09-02 06:43:53,395 INFO Namespace(n_epoch=250, lr_schedule=[50], lr=0.0002, gpu='0', out_dir='/data/work-gcp-europe-west4-a/yuqian_fu/Ego/checkpoints/egoexo_v2_480x480', train_dir=['/data/work-gcp-europe-west4-a/yuqian_fu/Ego/data_segswap'], prob_dir=[0.5, 0.5], batch_pos=32, batch_neg=15, feat_pth='../evalBrueghel/Moco_resnet50_feat_1Scale_640p.pkl', warp_mask=False, warmUpIter=1000, resume_pth=None, resume_epoch=0, mode='small', pos_weight=0.1, feat_weight=1, dropout=0.1, activation='relu', prob_style=0.5, layer_type=['I', 'C', 'I', 'C', 'I', 'N'], drop_feat=0.1, tps_grid=[4, 6], eta_corr=8, iter_epoch=1000, iter_epoch_val=100, weight_decay=0, reverse=False)
2024-09-02 06:43:53,396 INFO Load MocoV2 pre-trained ResNet-50 feature...
LOADING:  train_egoexo_pairs.json
LOADING:  val_egoexo_pairs.json

  0%|          | 0/1000 [00:00<?, ?it/s]
  0%|          | 0/1000 [00:07<?, ?it/s]
Traceback (most recent call last):
  File "/home/yuqian_fu/Projects/ego-exo4d-relation/correspondence/SegSwap/train/Main.py", line 188, in <module>
    backbone, netEncoder, optimizer, history = Train.trainEpoch(trainLoader, backbone, netEncoder, optimizer, history, Loss, ClsLoss, args.batch_pos, args.batch_neg, args.warp_mask, logger, args.eta_corr, args.warmUpIter, 0, args.lr, writer, warmup=True)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yuqian_fu/Projects/ego-exo4d-relation/correspondence/SegSwap/train/Train.py", line 80, in trainEpoch
    O1, O2, O3 = netEncoder(X, Y, FMTX, RS, RT)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yuqian_fu/Projects/ego-exo4d-relation/correspondence/SegSwap/model/transformer.py", line 342, in forward
    outx, outy, out_cls = self.net(x, y, fmask, x_mask, y_mask)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yuqian_fu/Projects/ego-exo4d-relation/correspondence/SegSwap/model/transformer.py", line 291, in forward
    featx, featy, x_mask, y_mask = self.encoder_blocks[i](featx, featy, featmask, x_mask, y_mask)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yuqian_fu/Projects/ego-exo4d-relation/correspondence/SegSwap/model/transformer.py", line 205, in forward
    featx, featy, x_mask, y_mask = self.layer1(featx, featy, featmask, x_mask, y_mask)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yuqian_fu/Projects/ego-exo4d-relation/correspondence/SegSwap/model/transformer.py", line 105, in forward
    output = self.inner_encoder_layer(output, src_mask=src_mask, src_key_padding_mask=src_key_padding_mask)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/modules/transformer.py", line 591, in forward
    x = self.norm1(x + self._sa_block(x, src_mask, src_key_padding_mask, is_causal=is_causal))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/modules/transformer.py", line 599, in _sa_block
    x = self.self_attn(x, x, x,
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/modules/activation.py", line 1205, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/yuqian_fu/micromamba/envs/auto-zap7rdp2jlp7/lib/python3.11/site-packages/torch/nn/functional.py", line 5373, in multi_head_attention_forward
    attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 792.00 MiB (GPU 0; 21.95 GiB total capacity; 20.03 GiB already allocated; 790.12 MiB free; 20.94 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
srun: error: gcpl4-eu-0: task 0: Exited with exit code 1