[10/14 08:56:03] detectron2 INFO: Rank of current process: 0. World size: 2 [10/14 08:56:07] detectron2 INFO: Environment info: ------------------------------- ------------------------------------------------------------------------------------------ sys.platform linux Python 3.10.15 | packaged by conda-forge | (main, Sep 30 2024, 17:51:04) [GCC 13.3.0] numpy 1.26.4 detectron2 0.6 @/home/chaeyun/.conda/envs/videonemo/lib/python3.10/site-packages/detectron2 Compiler GCC 12.3 CUDA compiler not available DETECTRON2_ENV_MODULE PyTorch 2.1.0+cu118 @/home/chaeyun/.conda/envs/videonemo/lib/python3.10/site-packages/torch PyTorch debug build False torch._C._GLIBCXX_USE_CXX11_ABI False GPU available Yes GPU 0,1 NVIDIA RTX A6000 (arch=8.6) Driver version 555.42.02 CUDA_HOME /opt/ohpc/pub/apps/cuda/11.3 Pillow 10.2.0 torchvision 0.16.0+cu118 @/home/chaeyun/.conda/envs/videonemo/lib/python3.10/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20221221 iopath 0.1.9 cv2 4.10.0 ------------------------------- ------------------------------------------------------------------------------------------ PyTorch built with: - GCC 9.3 - C++ Version: 201703 - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - NNPACK is enabled - CPU capability usage: AVX512 - CUDA Runtime 11.8 - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90 - CuDNN 8.7 - Magma 2.6.1 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, [10/14 08:56:07] detectron2 INFO: Command line arguments: Namespace(config_file='configs/dshmp_swin_tiny.yaml', resume=False, eval_only=True, num_gpus=2, num_machines=1, machine_rank=0, dist_url='auto', opts=['MODEL.WEIGHTS', './weights/repro_bs8_ddp_ft/model_final.pth', 'OUTPUT_DIR', './output/valid_u_bs8ckpt']) [10/14 08:56:07] detectron2 INFO: Contents of args.config_file=configs/dshmp_swin_tiny.yaml: _BASE_: ./Base-YouTubeVIS-VideoInstanceSegmentation.yaml MODEL:  BACKBONE:  NAME: "D2SwinTransformer"  SWIN:  EMBED_DIM: 96  DEPTHS: [2, 2, 6, 2]  NUM_HEADS: [3, 6, 12, 24]  WINDOW_SIZE: 7  APE: False  DROP_PATH_RATE: 0.3  PATCH_NORM: True  PIXEL_MEAN: [123.675, 116.280, 103.530]  PIXEL_STD: [58.395, 57.120, 57.37]  META_ARCHITECTURE: "DsHmp"  MASK_ON: True  SEM_SEG_HEAD:  NAME: "MaskFormerHead"  IGNORE_VALUE: 255  NUM_CLASSES: 1  LOSS_WEIGHT: 1.0  CONVS_DIM: 256  MASK_DIM: 256  NORM: "GN"  # pixel decoder  PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"  IN_FEATURES: ["res2", "res3", "res4", "res5"]  DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]  COMMON_STRIDE: 4  TRANSFORMER_ENC_LAYERS: 6  MASK_FORMER:  TRANSFORMER_DECODER_NAME: "DshmpMultiScaleMaskedTransformerDecoder"  TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"  DEEP_SUPERVISION: True  NO_OBJECT_WEIGHT: 0.1  CLASS_WEIGHT: 2.0  MASK_WEIGHT: 5.0  DICE_WEIGHT: 5.0  HIDDEN_DIM: 256  NUM_OBJECT_QUERIES: 20  NHEADS: 8  DROPOUT: 0.0  DIM_FEEDFORWARD: 2048  ENC_LAYERS: 0  PRE_NORM: False  ENFORCE_INPUT_PROJ: False  SIZE_DIVISIBILITY: 32  DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query  TRAIN_NUM_POINTS: 12544  OVERSAMPLE_RATIO: 3.0  IMPORTANCE_SAMPLE_RATIO: 0.75  TEST:  SEMANTIC_ON: False  INSTANCE_ON: True  PANOPTIC_ON: False  OVERLAP_THRESHOLD: 0.8  OBJECT_MASK_THRESHOLD: 0.8  VITA:  ENC_WINDOW_SIZE: 8  SIM_WEIGHT: 1.0  NUM_OBJECT_QUERIES: 10  TEST_OUTPUT_THRESHOLD: 0.5 DATASETS:  DATASET_RATIO: [1.0, 0.25]  TRAIN: ("mevis_train",)  TEST: ("mevis_val",) SOLVER:  IMS_PER_BATCH: 8  BASE_LR: 0.00005  STEPS: (40000, 50000)  MAX_ITER: 55000  WARMUP_FACTOR: 1.0  WARMUP_ITERS: 10  WEIGHT_DECAY: 0.05  OPTIMIZER: "ADAMW"  BACKBONE_MULTIPLIER: 0.1  CLIP_GRADIENTS:  ENABLED: True  CLIP_TYPE: "full_model"  CLIP_VALUE: 0.01  NORM_TYPE: 2.0 INPUT:  SAMPLING_FRAME_NUM: 8  SAMPLING_FRAME_RANGE: 10  SAMPLING_FRAME_SHUFFLE: False  # MIN_SIZE_TRAIN_SAMPLING : ["range", "choice", "range_by_clip", "choice_by_clip"]  MIN_SIZE_TRAIN_SAMPLING: "choice_by_clip"  # RANDOM_FLIP : ["none", "horizontal", "flip_by_clip"]. "horizontal" is set by default.  RANDOM_FLIP: "flip_by_clip"  AUGMENTATIONS: []  MIN_SIZE_TRAIN: (288, 320, 352, 384, 416, 448, 480, 512)  MAX_SIZE_TRAIN: 768  MIN_SIZE_TEST: 448  FORMAT: "RGB"  CROP:  ENABLED: True  TYPE: "absolute_range"  SIZE: (384, 600)  # For pseudo videos  PSEUDO:  AUGMENTATIONS: ['rotation']  MIN_SIZE_TRAIN: (288, 320, 352, 384, 416, 448, 480, 512)  MAX_SIZE_TRAIN: 768  CROP:  ENABLED: True  TYPE: "absolute_range"  SIZE: (384, 600)  LSJ_AUG:  ENABLED: False  IMAGE_SIZE: 768  MIN_SCALE: 0.1  MAX_SCALE: 2.0 DATALOADER:  FILTER_EMPTY_ANNOTATIONS: True  NUM_WORKERS: 8 TEST:  DETECTIONS_PER_IMAGE: 1  EVAL_PERIOD: 50000 OUTPUT_DIR: 'output/dshmp_model' [10/14 08:56:07] detectron2 INFO: Running with full config: CUDNN_BENCHMARK: false DATALOADER:  ASPECT_RATIO_GROUPING: true  FILTER_EMPTY_ANNOTATIONS: true  NUM_WORKERS: 8  REPEAT_SQRT: true  REPEAT_THRESHOLD: 0.0  SAMPLER_TRAIN: TrainingSampler DATASETS:  DATASET_RATIO:  - 1.0  - 0.25  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000  PROPOSAL_FILES_TEST: []  PROPOSAL_FILES_TRAIN: []  TEST:  - mevis_val  TRAIN:  - mevis_train FLOAT32_PRECISION: '' GLOBAL:  HACK: 1.0 INPUT:  AUGMENTATIONS: []  COLOR_AUG_SSD: false  CROP:  ENABLED: true  SINGLE_CATEGORY_MAX_AREA: 1.0  SIZE:  - 384  - 600  TYPE: absolute_range  DATASET_MAPPER_NAME: mask_former_semantic  FORMAT: RGB  IMAGE_SIZE: 1024  LSJ_AUG:  ENABLED: false  IMAGE_SIZE: 768  MAX_SCALE: 2.0  MIN_SCALE: 0.1  MASK_FORMAT: polygon  MAX_SCALE: 2.0  MAX_SIZE_TEST: 1333  MAX_SIZE_TRAIN: 768  MIN_SCALE: 0.1  MIN_SIZE_TEST: 448  MIN_SIZE_TRAIN:  - 288  - 320  - 352  - 384  - 416  - 448  - 480  - 512  MIN_SIZE_TRAIN_SAMPLING: choice_by_clip  PSEUDO:  AUGMENTATIONS:  - rotation  CROP:  ENABLED: true  SIZE:  - 384  - 600  TYPE: absolute_range  MAX_SIZE_TRAIN: 768  MIN_SIZE_TRAIN:  - 288  - 320  - 352  - 384  - 416  - 448  - 480  - 512  MIN_SIZE_TRAIN_SAMPLING: choice_by_clip  RANDOM_FLIP: flip_by_clip  SAMPLING_FRAME_NUM: 8  SAMPLING_FRAME_RANGE: 10  SAMPLING_FRAME_SHUFFLE: false  SIZE_DIVISIBILITY: -1 MODEL:  ANCHOR_GENERATOR:  ANGLES:  - - -90  - 0  - 90  ASPECT_RATIOS:  - - 0.5  - 1.0  - 2.0  NAME: DefaultAnchorGenerator  OFFSET: 0.0  SIZES:  - - 32  - 64  - 128  - 256  - 512  BACKBONE:  FREEZE_AT: 0  NAME: D2SwinTransformer  DEVICE: cuda  FPN:  FUSE_TYPE: sum  IN_FEATURES: []  NORM: ''  OUT_CHANNELS: 256  KEYPOINT_ON: false  LOAD_PROPOSALS: false  MASK_FORMER:  CLASS_WEIGHT: 2.0  DEC_LAYERS: 10  DEEP_SUPERVISION: true  DICE_WEIGHT: 5.0  DIM_FEEDFORWARD: 2048  DROPOUT: 0.0  ENC_LAYERS: 0  ENFORCE_INPUT_PROJ: false  HIDDEN_DIM: 256  IMPORTANCE_SAMPLE_RATIO: 0.75  MASK_WEIGHT: 5.0  NHEADS: 8  NO_OBJECT_WEIGHT: 0.1  NUM_OBJECT_QUERIES: 20  OVERSAMPLE_RATIO: 3.0  PRE_NORM: false  SIZE_DIVISIBILITY: 32  TEST:  INSTANCE_ON: true  OBJECT_MASK_THRESHOLD: 0.8  OVERLAP_THRESHOLD: 0.8  PANOPTIC_ON: false  SEMANTIC_ON: false  SEM_SEG_POSTPROCESSING_BEFORE_INFERENCE: false  TRAIN_NUM_POINTS: 12544  TRANSFORMER_DECODER_NAME: DshmpMultiScaleMaskedTransformerDecoder  TRANSFORMER_IN_FEATURE: multi_scale_pixel_decoder  MASK_ON: true  META_ARCHITECTURE: DsHmp  PANOPTIC_FPN:  COMBINE:  ENABLED: true  INSTANCES_CONFIDENCE_THRESH: 0.5  OVERLAP_THRESH: 0.5  STUFF_AREA_LIMIT: 4096  INSTANCE_LOSS_WEIGHT: 1.0  PIXEL_MEAN:  - 123.675  - 116.28  - 103.53  PIXEL_STD:  - 58.395  - 57.12  - 57.37  PROPOSAL_GENERATOR:  MIN_SIZE: 0  NAME: RPN  RESNETS:  DEFORM_MODULATED: false  DEFORM_NUM_GROUPS: 1  DEFORM_ON_PER_STAGE:  - false  - false  - false  - false  DEPTH: 50  NORM: FrozenBN  NUM_GROUPS: 1  OUT_FEATURES:  - res2  - res3  - res4  - res5  RES2_OUT_CHANNELS: 256  RES4_DILATION: 1  RES5_DILATION: 1  RES5_MULTI_GRID:  - 1  - 1  - 1  STEM_OUT_CHANNELS: 64  STEM_TYPE: basic  STRIDE_IN_1X1: false  WIDTH_PER_GROUP: 64  RETINANET:  BBOX_REG_LOSS_TYPE: smooth_l1  BBOX_REG_WEIGHTS: &id002  - 1.0  - 1.0  - 1.0  - 1.0  FOCAL_LOSS_ALPHA: 0.25  FOCAL_LOSS_GAMMA: 2.0  IN_FEATURES:  - p3  - p4  - p5  - p6  - p7  IOU_LABELS:  - 0  - -1  - 1  IOU_THRESHOLDS:  - 0.4  - 0.5  NMS_THRESH_TEST: 0.5  NORM: ''  NUM_CLASSES: 80  NUM_CONVS: 4  PRIOR_PROB: 0.01  SCORE_THRESH_TEST: 0.05  SMOOTH_L1_LOSS_BETA: 0.1  TOPK_CANDIDATES_TEST: 1000  ROI_BOX_CASCADE_HEAD:  BBOX_REG_WEIGHTS:  - &id001  - 10.0  - 10.0  - 5.0  - 5.0  - - 20.0  - 20.0  - 10.0  - 10.0  - - 30.0  - 30.0  - 15.0  - 15.0  IOUS:  - 0.5  - 0.6  - 0.7  ROI_BOX_HEAD:  BBOX_REG_LOSS_TYPE: smooth_l1  BBOX_REG_LOSS_WEIGHT: 1.0  BBOX_REG_WEIGHTS: *id001  CLS_AGNOSTIC_BBOX_REG: false  CONV_DIM: 256  FC_DIM: 1024  FED_LOSS_FREQ_WEIGHT_POWER: 0.5  FED_LOSS_NUM_CLASSES: 50  NAME: ''  NORM: ''  NUM_CONV: 0  NUM_FC: 0  POOLER_RESOLUTION: 14  POOLER_SAMPLING_RATIO: 0  POOLER_TYPE: ROIAlignV2  SMOOTH_L1_BETA: 0.0  TRAIN_ON_PRED_BOXES: false  USE_FED_LOSS: false  USE_SIGMOID_CE: false  ROI_HEADS:  BATCH_SIZE_PER_IMAGE: 512  IN_FEATURES:  - res4  IOU_LABELS:  - 0  - 1  IOU_THRESHOLDS:  - 0.5  NAME: Res5ROIHeads  NMS_THRESH_TEST: 0.5  NUM_CLASSES: 80  POSITIVE_FRACTION: 0.25  PROPOSAL_APPEND_GT: true  SCORE_THRESH_TEST: 0.05  ROI_KEYPOINT_HEAD:  CONV_DIMS:  - 512  - 512  - 512  - 512  - 512  - 512  - 512  - 512  LOSS_WEIGHT: 1.0  MIN_KEYPOINTS_PER_IMAGE: 1  NAME: KRCNNConvDeconvUpsampleHead  NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true  NUM_KEYPOINTS: 17  POOLER_RESOLUTION: 14  POOLER_SAMPLING_RATIO: 0  POOLER_TYPE: ROIAlignV2  ROI_MASK_HEAD:  CLS_AGNOSTIC_MASK: false  CONV_DIM: 256  NAME: MaskRCNNConvUpsampleHead  NORM: ''  NUM_CONV: 0  POOLER_RESOLUTION: 14  POOLER_SAMPLING_RATIO: 0  POOLER_TYPE: ROIAlignV2  RPN:  BATCH_SIZE_PER_IMAGE: 256  BBOX_REG_LOSS_TYPE: smooth_l1  BBOX_REG_LOSS_WEIGHT: 1.0  BBOX_REG_WEIGHTS: *id002  BOUNDARY_THRESH: -1  CONV_DIMS:  - -1  HEAD_NAME: StandardRPNHead  IN_FEATURES:  - res4  IOU_LABELS:  - 0  - -1  - 1  IOU_THRESHOLDS:  - 0.3  - 0.7  LOSS_WEIGHT: 1.0  NMS_THRESH: 0.7  POSITIVE_FRACTION: 0.5  POST_NMS_TOPK_TEST: 1000  POST_NMS_TOPK_TRAIN: 2000  PRE_NMS_TOPK_TEST: 6000  PRE_NMS_TOPK_TRAIN: 12000  SMOOTH_L1_BETA: 0.0  SEM_SEG_HEAD:  ASPP_CHANNELS: 256  ASPP_DILATIONS:  - 6  - 12  - 18  ASPP_DROPOUT: 0.1  COMMON_STRIDE: 4  CONVS_DIM: 256  DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES:  - res3  - res4  - res5  DEFORMABLE_TRANSFORMER_ENCODER_N_HEADS: 8  DEFORMABLE_TRANSFORMER_ENCODER_N_POINTS: 4  IGNORE_VALUE: 255  IN_FEATURES:  - res2  - res3  - res4  - res5  LOSS_TYPE: hard_pixel_mining  LOSS_WEIGHT: 1.0  MASK_DIM: 256  NAME: MaskFormerHead  NORM: GN  NUM_CLASSES: 1  PIXEL_DECODER_NAME: MSDeformAttnPixelDecoder  PROJECT_CHANNELS:  - 48  PROJECT_FEATURES:  - res2  TRANSFORMER_ENC_LAYERS: 6  USE_DEPTHWISE_SEPARABLE_CONV: false  SWIN:  APE: false  ATTN_DROP_RATE: 0.0  DEPTHS:  - 2  - 2  - 6  - 2  DROP_PATH_RATE: 0.3  DROP_RATE: 0.0  EMBED_DIM: 96  MLP_RATIO: 4.0  NUM_HEADS:  - 3  - 6  - 12  - 24  OUT_FEATURES:  - res2  - res3  - res4  - res5  PATCH_NORM: true  PATCH_SIZE: 4  PRETRAIN_IMG_SIZE: 224  QKV_BIAS: true  QK_SCALE: null  USE_CHECKPOINT: false  WINDOW_SIZE: 7  VITA:  APPLY_CLS_THRES: 0.01  DEC_LAYERS: 3  DEEP_SUPERVISION: true  DIM_FEEDFORWARD: 2048  DROPOUT: 0.0  ENC_LAYERS: 6  ENC_WINDOW_SIZE: 8  ENFORCE_INPUT_PROJ: true  FREEZE_DETECTOR: false  FREEZE_TEXT_ENCODER: true  HIDDEN_DIM: 256  LAST_LAYER_NUM: 3  MULTI_CLS_ON: true  NHEADS: 8  NO_OBJECT_WEIGHT: 0.1  NUM_OBJECT_QUERIES: 10  PRE_NORM: false  SIM_USE_CLIP: true  SIM_WEIGHT: 1.0  TEST_INTERPOLATE_CHUNK_SIZE: 5  TEST_OUTPUT_THRESHOLD: 0.5  TEST_RUN_CHUNK_SIZE: 18  WEIGHTS: ./weights/repro_bs8_ddp_ft/model_final.pth OUTPUT_DIR: ./output/valid_u_bs8ckpt SEED: -1 SOLVER:  AMP:  ENABLED: true  BACKBONE_MULTIPLIER: 0.1  BASE_LR: 5.0e-05  BASE_LR_END: 0.0  BIAS_LR_FACTOR: 1.0  CHECKPOINT_PERIOD: 5000  CLIP_GRADIENTS:  CLIP_TYPE: full_model  CLIP_VALUE: 0.01  ENABLED: true  NORM_TYPE: 2.0  GAMMA: 0.1  IMS_PER_BATCH: 8  LR_SCHEDULER_NAME: WarmupMultiStepLR  MAX_ITER: 55000  MOMENTUM: 0.9  NESTEROV: false  NUM_DECAYS: 3  OPTIMIZER: ADAMW  POLY_LR_CONSTANT_ENDING: 0.0  POLY_LR_POWER: 0.9  REFERENCE_WORLD_SIZE: 0  RESCALE_INTERVAL: false  STEPS:  - 40000  - 50000  WARMUP_FACTOR: 1.0  WARMUP_ITERS: 10  WARMUP_METHOD: linear  WEIGHT_DECAY: 0.05  WEIGHT_DECAY_BIAS: null  WEIGHT_DECAY_EMBED: 0.0  WEIGHT_DECAY_NORM: 0.0 TEST:  AUG:  ENABLED: false  FLIP: true  MAX_SIZE: 4000  MIN_SIZES:  - 400  - 500  - 600  - 700  - 800  - 900  - 1000  - 1100  - 1200  DETECTIONS_PER_IMAGE: 1  EVAL_PERIOD: 50000  EXPECTED_RESULTS: []  KEYPOINT_OKS_SIGMAS: []  PRECISE_BN:  ENABLED: false  NUM_ITER: 200 VERSION: 2 VIS_PERIOD: 0 [10/14 08:56:07] detectron2 INFO: Full config saved to ./output/valid_u_bs8ckpt/config.yaml [10/14 08:56:08] d2.utils.env INFO: Using a generated random seed 9097300 [10/14 08:56:14] d2.checkpoint.detection_checkpoint INFO: [DetectionCheckpointer] Loading from ./weights/repro_bs8_ddp_ft/model_final.pth ... [10/14 08:56:14] fvcore.common.checkpoint INFO: [Checkpointer] Loading from ./weights/repro_bs8_ddp_ft/model_final.pth ... [10/14 08:56:21] d2.data.common INFO: Serializing the dataset using: [10/14 08:56:21] d2.data.common INFO: Serializing 793 elements to byte tensors and concatenating them all ... [10/14 08:56:21] d2.data.common INFO: Serialized dataset takes 3.22 MiB