PropVG

File size: 22,022 Bytes

a482a69

2025-07-07 11:27:50,676 - PropVG - INFO - dataset = 'MixedSeg'
data_root = './data/seqtr_type/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375])
train_pipeline = [
    dict(
        type='LoadImageAnnotationsFromFile_TO',
        max_token=20,
        with_mask=True,
        with_bbox=True,
        dataset='MixedSeg',
        use_token_type='beit3',
        refer_file='data/seqtr_type/annotations/mixed-seg/coco_all.json',
        object_area_filter=100,
        object_area_rate_filter=[0.05, 0.8]),
    dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375]),
    dict(type='DefaultFormatBundle'),
    dict(
        type='CollectData',
        keys=[
            'img', 'ref_expr_inds', 'text_attention_mask', 'gt_mask_rle',
            'gt_bbox'
        ],
        meta_keys=[
            'filename', 'expression', 'ori_shape', 'img_shape', 'pad_shape',
            'scale_factor', 'gt_ori_mask', 'target', 'empty',
            'refer_target_index'
        ])
]
val_pipeline = [
    dict(
        type='LoadImageAnnotationsFromFile_TO',
        max_token=20,
        with_mask=True,
        with_bbox=True,
        dataset='MixedSeg',
        use_token_type='beit3',
        refer_file='data/seqtr_type/annotations/mixed-seg/coco_all.json',
        object_area_filter=100,
        object_area_rate_filter=[0.05, 0.8]),
    dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375]),
    dict(type='DefaultFormatBundle'),
    dict(
        type='CollectData',
        keys=[
            'img', 'ref_expr_inds', 'text_attention_mask', 'gt_mask_rle',
            'gt_bbox'
        ],
        meta_keys=[
            'filename', 'expression', 'ori_shape', 'img_shape', 'pad_shape',
            'scale_factor', 'gt_ori_mask', 'target', 'empty',
            'refer_target_index'
        ])
]
test_pipeline = [
    dict(
        type='LoadImageAnnotationsFromFile_TO',
        max_token=20,
        with_mask=True,
        with_bbox=True,
        dataset='MixedSeg',
        use_token_type='beit3',
        refer_file='data/seqtr_type/annotations/mixed-seg/coco_all.json',
        object_area_filter=100,
        object_area_rate_filter=[0.05, 0.8]),
    dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375]),
    dict(type='DefaultFormatBundle'),
    dict(
        type='CollectData',
        keys=[
            'img', 'ref_expr_inds', 'text_attention_mask', 'gt_mask_rle',
            'gt_bbox'
        ],
        meta_keys=[
            'filename', 'expression', 'ori_shape', 'img_shape', 'pad_shape',
            'scale_factor', 'gt_ori_mask', 'target', 'empty',
            'refer_target_index'
        ])
]
word_emb_cfg = dict(type='GloVe')
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=4,
    train=dict(
        type='MixedSeg',
        which_set='train',
        img_source=['coco'],
        annsfile=
        './data/seqtr_type/annotations/mixed-seg/instances_nogoogle_withid.json',
        imgsfile='./data/seqtr_type/images/mscoco/train2014',
        pipeline=[
            dict(
                type='LoadImageAnnotationsFromFile_TO',
                max_token=20,
                with_mask=True,
                with_bbox=True,
                dataset='MixedSeg',
                use_token_type='beit3',
                refer_file=
                'data/seqtr_type/annotations/mixed-seg/coco_all.json',
                object_area_filter=100,
                object_area_rate_filter=[0.05, 0.8]),
            dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375]),
            dict(type='DefaultFormatBundle'),
            dict(
                type='CollectData',
                keys=[
                    'img', 'ref_expr_inds', 'text_attention_mask',
                    'gt_mask_rle', 'gt_bbox'
                ],
                meta_keys=[
                    'filename', 'expression', 'ori_shape', 'img_shape',
                    'pad_shape', 'scale_factor', 'gt_ori_mask', 'target',
                    'empty', 'refer_target_index'
                ])
        ],
        word_emb_cfg=dict(type='GloVe')),
    val_refcoco_unc=dict(
        type='MixedSeg',
        which_set='val_refcoco_unc',
        img_source=['coco'],
        annsfile=
        './data/seqtr_type/annotations/mixed-seg/instances_nogoogle_withid.json',
        imgsfile='./data/seqtr_type/images/mscoco/train2014',
        pipeline=[
            dict(
                type='LoadImageAnnotationsFromFile_TO',
                max_token=20,
                with_mask=True,
                with_bbox=True,
                dataset='MixedSeg',
                use_token_type='beit3',
                refer_file=
                'data/seqtr_type/annotations/mixed-seg/coco_all.json',
                object_area_filter=100,
                object_area_rate_filter=[0.05, 0.8]),
            dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375]),
            dict(type='DefaultFormatBundle'),
            dict(
                type='CollectData',
                keys=[
                    'img', 'ref_expr_inds', 'text_attention_mask',
                    'gt_mask_rle', 'gt_bbox'
                ],
                meta_keys=[
                    'filename', 'expression', 'ori_shape', 'img_shape',
                    'pad_shape', 'scale_factor', 'gt_ori_mask', 'target',
                    'empty', 'refer_target_index'
                ])
        ],
        word_emb_cfg=dict(type='GloVe')),
    testA_refcoco_unc=dict(
        type='MixedSeg',
        which_set='testA_refcoco_unc',
        img_source=['coco'],
        annsfile=
        './data/seqtr_type/annotations/mixed-seg/instances_nogoogle_withid.json',
        imgsfile='./data/seqtr_type/images/mscoco/train2014',
        pipeline=[
            dict(
                type='LoadImageAnnotationsFromFile_TO',
                max_token=20,
                with_mask=True,
                with_bbox=True,
                dataset='MixedSeg',
                use_token_type='beit3',
                refer_file=
                'data/seqtr_type/annotations/mixed-seg/coco_all.json',
                object_area_filter=100,
                object_area_rate_filter=[0.05, 0.8]),
            dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375]),
            dict(type='DefaultFormatBundle'),
            dict(
                type='CollectData',
                keys=[
                    'img', 'ref_expr_inds', 'text_attention_mask',
                    'gt_mask_rle', 'gt_bbox'
                ],
                meta_keys=[
                    'filename', 'expression', 'ori_shape', 'img_shape',
                    'pad_shape', 'scale_factor', 'gt_ori_mask', 'target',
                    'empty', 'refer_target_index'
                ])
        ],
        word_emb_cfg=dict(type='GloVe')),
    testB_refcoco_unc=dict(
        type='MixedSeg',
        which_set='testB_refcoco_unc',
        img_source=['coco'],
        annsfile=
        './data/seqtr_type/annotations/mixed-seg/instances_nogoogle_withid.json',
        imgsfile='./data/seqtr_type/images/mscoco/train2014',
        pipeline=[
            dict(
                type='LoadImageAnnotationsFromFile_TO',
                max_token=20,
                with_mask=True,
                with_bbox=True,
                dataset='MixedSeg',
                use_token_type='beit3',
                refer_file=
                'data/seqtr_type/annotations/mixed-seg/coco_all.json',
                object_area_filter=100,
                object_area_rate_filter=[0.05, 0.8]),
            dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375]),
            dict(type='DefaultFormatBundle'),
            dict(
                type='CollectData',
                keys=[
                    'img', 'ref_expr_inds', 'text_attention_mask',
                    'gt_mask_rle', 'gt_bbox'
                ],
                meta_keys=[
                    'filename', 'expression', 'ori_shape', 'img_shape',
                    'pad_shape', 'scale_factor', 'gt_ori_mask', 'target',
                    'empty', 'refer_target_index'
                ])
        ],
        word_emb_cfg=dict(type='GloVe')),
    val_refcocoplus_unc=dict(
        type='MixedSeg',
        which_set='val_refcocoplus_unc',
        img_source=['coco'],
        annsfile=
        './data/seqtr_type/annotations/mixed-seg/instances_nogoogle_withid.json',
        imgsfile='./data/seqtr_type/images/mscoco/train2014',
        pipeline=[
            dict(
                type='LoadImageAnnotationsFromFile_TO',
                max_token=20,
                with_mask=True,
                with_bbox=True,
                dataset='MixedSeg',
                use_token_type='beit3',
                refer_file=
                'data/seqtr_type/annotations/mixed-seg/coco_all.json',
                object_area_filter=100,
                object_area_rate_filter=[0.05, 0.8]),
            dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375]),
            dict(type='DefaultFormatBundle'),
            dict(
                type='CollectData',
                keys=[
                    'img', 'ref_expr_inds', 'text_attention_mask',
                    'gt_mask_rle', 'gt_bbox'
                ],
                meta_keys=[
                    'filename', 'expression', 'ori_shape', 'img_shape',
                    'pad_shape', 'scale_factor', 'gt_ori_mask', 'target',
                    'empty', 'refer_target_index'
                ])
        ],
        word_emb_cfg=dict(type='GloVe')),
    testA_refcocoplus_unc=dict(
        type='MixedSeg',
        which_set='testA_refcocoplus_unc',
        img_source=['coco'],
        annsfile=
        './data/seqtr_type/annotations/mixed-seg/instances_nogoogle_withid.json',
        imgsfile='./data/seqtr_type/images/mscoco/train2014',
        pipeline=[
            dict(
                type='LoadImageAnnotationsFromFile_TO',
                max_token=20,
                with_mask=True,
                with_bbox=True,
                dataset='MixedSeg',
                use_token_type='beit3',
                refer_file=
                'data/seqtr_type/annotations/mixed-seg/coco_all.json',
                object_area_filter=100,
                object_area_rate_filter=[0.05, 0.8]),
            dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375]),
            dict(type='DefaultFormatBundle'),
            dict(
                type='CollectData',
                keys=[
                    'img', 'ref_expr_inds', 'text_attention_mask',
                    'gt_mask_rle', 'gt_bbox'
                ],
                meta_keys=[
                    'filename', 'expression', 'ori_shape', 'img_shape',
                    'pad_shape', 'scale_factor', 'gt_ori_mask', 'target',
                    'empty', 'refer_target_index'
                ])
        ],
        word_emb_cfg=dict(type='GloVe')),
    testB_refcocoplus_unc=dict(
        type='MixedSeg',
        which_set='testB_refcocoplus_unc',
        img_source=['coco'],
        annsfile=
        './data/seqtr_type/annotations/mixed-seg/instances_nogoogle_withid.json',
        imgsfile='./data/seqtr_type/images/mscoco/train2014',
        pipeline=[
            dict(
                type='LoadImageAnnotationsFromFile_TO',
                max_token=20,
                with_mask=True,
                with_bbox=True,
                dataset='MixedSeg',
                use_token_type='beit3',
                refer_file=
                'data/seqtr_type/annotations/mixed-seg/coco_all.json',
                object_area_filter=100,
                object_area_rate_filter=[0.05, 0.8]),
            dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375]),
            dict(type='DefaultFormatBundle'),
            dict(
                type='CollectData',
                keys=[
                    'img', 'ref_expr_inds', 'text_attention_mask',
                    'gt_mask_rle', 'gt_bbox'
                ],
                meta_keys=[
                    'filename', 'expression', 'ori_shape', 'img_shape',
                    'pad_shape', 'scale_factor', 'gt_ori_mask', 'target',
                    'empty', 'refer_target_index'
                ])
        ],
        word_emb_cfg=dict(type='GloVe')),
    val_refcocog_umd=dict(
        type='MixedSeg',
        which_set='val_refcocog_umd',
        img_source=['coco'],
        annsfile=
        './data/seqtr_type/annotations/mixed-seg/instances_nogoogle_withid.json',
        imgsfile='./data/seqtr_type/images/mscoco/train2014',
        pipeline=[
            dict(
                type='LoadImageAnnotationsFromFile_TO',
                max_token=20,
                with_mask=True,
                with_bbox=True,
                dataset='MixedSeg',
                use_token_type='beit3',
                refer_file=
                'data/seqtr_type/annotations/mixed-seg/coco_all.json',
                object_area_filter=100,
                object_area_rate_filter=[0.05, 0.8]),
            dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375]),
            dict(type='DefaultFormatBundle'),
            dict(
                type='CollectData',
                keys=[
                    'img', 'ref_expr_inds', 'text_attention_mask',
                    'gt_mask_rle', 'gt_bbox'
                ],
                meta_keys=[
                    'filename', 'expression', 'ori_shape', 'img_shape',
                    'pad_shape', 'scale_factor', 'gt_ori_mask', 'target',
                    'empty', 'refer_target_index'
                ])
        ],
        word_emb_cfg=dict(type='GloVe')),
    test_refcocog_umd=dict(
        type='MixedSeg',
        which_set='test_refcocog_umd',
        img_source=['coco'],
        annsfile=
        './data/seqtr_type/annotations/mixed-seg/instances_nogoogle_withid.json',
        imgsfile='./data/seqtr_type/images/mscoco/train2014',
        pipeline=[
            dict(
                type='LoadImageAnnotationsFromFile_TO',
                max_token=20,
                with_mask=True,
                with_bbox=True,
                dataset='MixedSeg',
                use_token_type='beit3',
                refer_file=
                'data/seqtr_type/annotations/mixed-seg/coco_all.json',
                object_area_filter=100,
                object_area_rate_filter=[0.05, 0.8]),
            dict(type='Resize', img_scale=(384, 384), keep_ratio=False),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375]),
            dict(type='DefaultFormatBundle'),
            dict(
                type='CollectData',
                keys=[
                    'img', 'ref_expr_inds', 'text_attention_mask',
                    'gt_mask_rle', 'gt_bbox'
                ],
                meta_keys=[
                    'filename', 'expression', 'ori_shape', 'img_shape',
                    'pad_shape', 'scale_factor', 'gt_ori_mask', 'target',
                    'empty', 'refer_target_index'
                ])
        ],
        word_emb_cfg=dict(type='GloVe')))
ema = False
ema_factor = 0.999
use_fp16 = False
seed = 6666
deterministic = True
log_level = 'INFO'
log_interval = 50
save_interval = -1
resume_from = None
load_from = 'work_dir/refcoco-mix/PropVG-refcoco-mix.pth'
finetune_from = None
evaluate_interval = 1
start_evaluate_epoch = 0
start_save_checkpoint = 20
max_token = 20
img_size = 384
patch_size = 16
model = dict(
    type='MIXRefUniModel_OMG',
    vis_enc=dict(
        type='BEIT3',
        img_size=384,
        patch_size=16,
        vit_type='base',
        drop_path_rate=0.1,
        vocab_size=64010,
        freeze_layer=-1,
        vision_embed_proj_interpolate=False,
        pretrain='pretrain_weights/beit3_base_patch16_224.zip'),
    lan_enc=None,
    fusion=None,
    head=dict(
        type='REFHead',
        input_channels=768,
        hidden_channels=256,
        num_queries=20,
        detr_loss=dict(
            criterion=dict(loss_class=1.0, loss_bbox=5.0, loss_giou=2.0),
            matcher=dict(cost_class=1.0, cost_bbox=5.0, cost_giou=2.0)),
        loss_weight=dict(
            mask=dict(dice=1.0, bce=1.0, nt=0.2, neg=0),
            bbox=0.1,
            allbbox=0.1,
            refer=1.0),
        MTD=dict(K=100)),
    post_params=dict(
        score_weighted=False,
        mask_threshold=0.5,
        score_threshold=0.7,
        with_nms=False,
        with_mask=True),
    process_visual=False,
    visualize_params=dict(row_columns=(4, 5)),
    visual_mode='test')
grad_norm_clip = 0.15
lr = 0.0005
optimizer_config = dict(
    type='Adam',
    lr=0.0005,
    lr_vis_enc=5e-05,
    lr_lan_enc=0.0005,
    betas=(0.9, 0.98),
    eps=1e-09,
    weight_decay=0,
    amsgrad=True)
scheduler_config = dict(
    type='MultiStepLRWarmUp',
    warmup_epochs=1,
    decay_steps=[21, 27],
    decay_ratio=0.1,
    max_epoch=30)
launcher = 'pytorch'
distributed = True
rank = 0
world_size = 1

2025-07-07 11:27:58,403 - PropVG - INFO - Mixed-val_refcoco_unc size: 10834
2025-07-07 11:28:06,594 - PropVG - INFO - Mixed-testA_refcoco_unc size: 5657
2025-07-07 11:28:15,164 - PropVG - INFO - Mixed-testB_refcoco_unc size: 5095
2025-07-07 11:28:23,677 - PropVG - INFO - Mixed-val_refcocoplus_unc size: 10758
2025-07-07 11:28:30,907 - PropVG - INFO - Mixed-testA_refcocoplus_unc size: 5726
2025-07-07 11:28:38,494 - PropVG - INFO - Mixed-testB_refcocoplus_unc size: 4889
2025-07-07 11:28:49,090 - PropVG - INFO - Mixed-val_refcocog_umd size: 4896
2025-07-07 11:28:54,576 - PropVG - INFO - Mixed-test_refcocog_umd size: 9602
2025-07-07 11:29:02,664 - PropVG - INFO - loaded checkpoint from work_dir/refcoco-mix/PropVG-refcoco-mix.pth

2025-07-07 11:29:02,665 - PropVG - INFO - PropVG - evaluating set val_refcoco_unc
2025-07-07 11:32:39,213 - PropVG - INFO - ------------ validate ------------  time: 216.54, DetACC: 92.70, mIoU: 81.96, oIoU: 81.80, MaskACC@0.5-0.9: [92.24, 90.71, 87.59,  79.79,  46.59]DetACC@0.5-0.9: [92.70, 91.43, 88.90,  83.85,  66.30]
2025-07-07 11:32:43,474 - PropVG - INFO - PropVG - evaluating set testA_refcoco_unc
2025-07-07 11:34:47,838 - PropVG - INFO - ------------ validate ------------  time: 124.36, DetACC: 95.07, mIoU: 83.58, oIoU: 83.74, MaskACC@0.5-0.9: [94.56, 93.48, 90.93,  82.91,  46.61]DetACC@0.5-0.9: [95.07, 93.99, 92.17,  88.17,  69.29]
2025-07-07 11:34:53,297 - PropVG - INFO - PropVG - evaluating set testB_refcoco_unc
2025-07-07 11:36:51,290 - PropVG - INFO - ------------ validate ------------  time: 117.99, DetACC: 89.58, mIoU: 80.02, oIoU: 79.33, MaskACC@0.5-0.9: [89.19, 86.99, 83.45,  76.76,  51.07]DetACC@0.5-0.9: [89.58, 87.56, 84.61,  79.14,  61.83]
2025-07-07 11:36:56,652 - PropVG - INFO - PropVG - evaluating set val_refcocoplus_unc
2025-07-07 11:40:28,540 - PropVG - INFO - ------------ validate ------------  time: 211.88, DetACC: 87.27, mIoU: 77.14, oIoU: 74.81, MaskACC@0.5-0.9: [86.67, 85.36, 82.52,  75.28,  44.34]DetACC@0.5-0.9: [87.27, 86.30, 84.09,  79.64,  63.62]
2025-07-07 11:40:36,392 - PropVG - INFO - PropVG - evaluating set testA_refcocoplus_unc
2025-07-07 11:42:43,800 - PropVG - INFO - ------------ validate ------------  time: 127.40, DetACC: 90.87, mIoU: 79.83, oIoU: 78.72, MaskACC@0.5-0.9: [90.13, 88.79, 86.57,  79.46,  45.04]DetACC@0.5-0.9: [90.87, 89.82, 87.81,  83.92,  66.33]
2025-07-07 11:42:48,169 - PropVG - INFO - PropVG - evaluating set testB_refcocoplus_unc
2025-07-07 11:44:41,261 - PropVG - INFO - ------------ validate ------------  time: 113.09, DetACC: 81.26, mIoU: 72.18, oIoU: 69.15, MaskACC@0.5-0.9: [80.18, 78.20, 74.78,  68.68,  45.88]DetACC@0.5-0.9: [81.26, 79.40, 76.95,  72.20,  56.78]
2025-07-07 11:44:45,751 - PropVG - INFO - PropVG - evaluating set val_refcocog_umd
2025-07-07 11:46:42,173 - PropVG - INFO - ------------ validate ------------  time: 116.42, DetACC: 88.15, mIoU: 76.97, oIoU: 75.54, MaskACC@0.5-0.9: [86.17, 83.58, 79.43,  72.16,  44.87]DetACC@0.5-0.9: [88.15, 85.97, 82.90,  78.00,  63.09]
2025-07-07 11:46:46,257 - PropVG - INFO - PropVG - evaluating set test_refcocog_umd
2025-07-07 11:50:06,821 - PropVG - INFO - ------------ validate ------------  time: 200.56, DetACC: 88.30, mIoU: 77.72, oIoU: 77.40, MaskACC@0.5-0.9: [87.14, 85.01, 80.84,  72.78,  45.79]DetACC@0.5-0.9: [88.30, 86.71, 83.98,  79.07,  65.00]
2025-07-07 11:50:11,168 - PropVG - INFO - sucessfully save the results to work_dir/refcoco-mix/refer_output_thr0.7_no-nms_no-sw_0.5_100.xlsx !!!