| Model: /fsx_0/user/imzyc/proact_exps/20240821-L4096-I1-ep4-NOSEP-nr0.1-klgmix-1s-lora-bs256 | |
| {'ego4d/narration_val_L4096_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}]}} | |
| Evaluation datasets: | |
| * ego4d/narration_val | num samples: 65 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.2 | |
| ER@0.05: 0.863 (S=0, C=1462, M=7129, R=283) | |
| ER@0.10: 0.863 (S=0, C=1462, M=7129, R=283) | |
| ER@0.15: 0.863 (S=0, C=1462, M=7129, R=283) | |
| ER@0.20: 0.863 (S=0, C=1462, M=7129, R=283) | |
| ER@0.25: 0.863 (S=0, C=1462, M=7129, R=283) | |
| ER@0.30: 0.863 (S=1, C=1461, M=7129, R=283) | |
| ER@0.35: 0.863 (S=3, C=1459, M=7129, R=283) | |
| ER@0.40: 0.865 (S=15, C=1447, M=7129, R=283) | |
| ER@0.45: 0.867 (S=34, C=1428, M=7129, R=283) | |
| ER@0.50: 0.870 (S=65, C=1397, M=7129, R=283) | |
| ER@0.55: 0.882 (S=165, C=1297, M=7129, R=283) | |
| ER@0.60: 0.898 (S=299, C=1163, M=7129, R=283) | |
| ER@0.65: 0.915 (S=448, C=1014, M=7129, R=283) | |
| ER@0.70: 0.936 (S=626, C=836, M=7129, R=283) | |
| ER@0.75: 0.955 (S=790, C=672, M=7129, R=283) | |
| ER@0.80: 0.978 (S=994, C=468, M=7129, R=283) | |
| ER@0.85: 0.998 (S=1166, C=296, M=7129, R=283) | |
| ER@0.90: 1.014 (S=1303, C=159, M=7129, R=283) | |
| ER@0.95: 1.024 (S=1381, C=81, M=7129, R=283) | |
| ER@1.00: 1.027 (S=1411, C=51, M=7129, R=283) | |
| Evalulation: ego4d-narration_val_L4096_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.8298 | |
| redundant_rate: 0.1622 | |
| match_cost: 0.3210 | |
| semantic_score: 0.7264 | |
| jaccard_index: 0.1648 | |
| Bleu_1: 0.4403 | |
| Bleu_1_w: 0.0725 | |
| Bleu_2: 0.2713 | |
| Bleu_2_w: 0.0447 | |
| Bleu_3: 0.1742 | |
| Bleu_3_w: 0.0287 | |
| Bleu_4: 0.1155 | |
| Bleu_4_w: 0.0190 | |
| CIDEr: 1.0878 | |
| CIDEr_w: 0.1792 | |
| METEOR: 0.2163 | |
| METEOR_w: 0.0356 | |
| mean_error_rate: 0.9134 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| ER@0.05: 1.026 (S=0, C=5981, M=2610, R=6203) | |
| ER@0.10: 1.026 (S=0, C=5981, M=2610, R=6203) | |
| ER@0.15: 1.026 (S=0, C=5981, M=2610, R=6203) | |
| ER@0.20: 1.026 (S=0, C=5981, M=2610, R=6203) | |
| ER@0.25: 1.026 (S=0, C=5981, M=2610, R=6203) | |
| ER@0.30: 1.026 (S=5, C=5976, M=2610, R=6203) | |
| ER@0.35: 1.028 (S=22, C=5959, M=2610, R=6203) | |
| ER@0.40: 1.035 (S=77, C=5904, M=2610, R=6203) | |
| ER@0.45: 1.054 (S=246, C=5735, M=2610, R=6203) | |
| ER@0.50: 1.090 (S=554, C=5427, M=2610, R=6203) | |
| ER@0.55: 1.143 (S=1004, C=4977, M=2610, R=6203) | |
| ER@0.60: 1.215 (S=1628, C=4353, M=2610, R=6203) | |
| ER@0.65: 1.291 (S=2279, C=3702, M=2610, R=6203) | |
| ER@0.70: 1.367 (S=2931, C=3050, M=2610, R=6203) | |
| ER@0.75: 1.446 (S=3613, C=2368, M=2610, R=6203) | |
| ER@0.80: 1.535 (S=4376, C=1605, M=2610, R=6203) | |
| ER@0.85: 1.606 (S=4981, C=1000, M=2610, R=6203) | |
| ER@0.90: 1.655 (S=5407, C=574, M=2610, R=6203) | |
| ER@0.95: 1.683 (S=5644, C=337, M=2610, R=6203) | |
| ER@1.00: 1.699 (S=5786, C=195, M=2610, R=6203) | |
| Evalulation: ego4d-narration_val_L4096_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.3038 | |
| redundant_rate: 0.5091 | |
| match_cost: 0.3448 | |
| semantic_score: 0.7023 | |
| jaccard_index: 0.4043 | |
| Bleu_1: 0.4528 | |
| Bleu_1_w: 0.1830 | |
| Bleu_2: 0.2831 | |
| Bleu_2_w: 0.1145 | |
| Bleu_3: 0.1875 | |
| Bleu_3_w: 0.0758 | |
| Bleu_4: 0.1271 | |
| Bleu_4_w: 0.0514 | |
| CIDEr: 1.1200 | |
| CIDEr_w: 0.4528 | |
| METEOR: 0.2080 | |
| METEOR_w: 0.0841 | |
| mean_error_rate: 1.2502 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| ER@0.05: 2.927 (S=0, C=8019, M=572, R=24571) | |
| ER@0.10: 2.927 (S=0, C=8019, M=572, R=24571) | |
| ER@0.15: 2.927 (S=0, C=8019, M=572, R=24571) | |
| ER@0.20: 2.927 (S=0, C=8019, M=572, R=24571) | |
| ER@0.25: 2.927 (S=4, C=8015, M=572, R=24571) | |
| ER@0.30: 2.929 (S=20, C=7999, M=572, R=24571) | |
| ER@0.35: 2.934 (S=62, C=7957, M=572, R=24571) | |
| ER@0.40: 2.947 (S=177, C=7842, M=572, R=24571) | |
| ER@0.45: 2.977 (S=433, C=7586, M=572, R=24571) | |
| ER@0.50: 3.029 (S=876, C=7143, M=572, R=24571) | |
| ER@0.55: 3.105 (S=1528, C=6491, M=572, R=24571) | |
| ER@0.60: 3.198 (S=2329, C=5690, M=572, R=24571) | |
| ER@0.65: 3.297 (S=3181, C=4838, M=572, R=24571) | |
| ER@0.70: 3.402 (S=4083, C=3936, M=572, R=24571) | |
| ER@0.75: 3.517 (S=5068, C=2951, M=572, R=24571) | |
| ER@0.80: 3.624 (S=5992, C=2027, M=572, R=24571) | |
| ER@0.85: 3.718 (S=6797, C=1222, M=572, R=24571) | |
| ER@0.90: 3.776 (S=7295, C=724, M=572, R=24571) | |
| ER@0.95: 3.809 (S=7578, C=441, M=572, R=24571) | |
| ER@1.00: 3.831 (S=7772, C=247, M=572, R=24571) | |
| Evalulation: ego4d-narration_val_L4096_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.0666 | |
| redundant_rate: 0.7539 | |
| match_cost: 0.3386 | |
| semantic_score: 0.6923 | |
| jaccard_index: 0.2418 | |
| Bleu_1: 0.4251 | |
| Bleu_1_w: 0.1028 | |
| Bleu_2: 0.2628 | |
| Bleu_2_w: 0.0635 | |
| Bleu_3: 0.1732 | |
| Bleu_3_w: 0.0419 | |
| Bleu_4: 0.1153 | |
| Bleu_4_w: 0.0279 | |
| CIDEr: 1.0783 | |
| CIDEr_w: 0.2608 | |
| METEOR: 0.1964 | |
| METEOR_w: 0.0475 | |
| mean_error_rate: 3.2363 | |
| All Finished! Time: 0.18 minutes | |
| Model: /fsx_0/user/imzyc/proact_exps/20240821-L4096-I1-ep4-NOSEP-nr0.1-klgmix-1s-lora-bs256 | |
| {'assembly101/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}]}, | |
| 'ego4d/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}]}, | |
| 'egoexolearn/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}]}, | |
| 'epickitchens/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}]}, | |
| 'holoassist/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}]}, | |
| 'wtag/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}]}} | |
| Evaluation datasets: | |
| * ego4d/dialog_val | num samples: 96 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.3 | |
| ER@0.05: 0.874 (S=0, C=1598, M=3216, R=991) | |
| ER@0.10: 0.874 (S=0, C=1598, M=3216, R=991) | |
| ER@0.15: 0.875 (S=3, C=1595, M=3216, R=991) | |
| ER@0.20: 0.877 (S=13, C=1585, M=3216, R=991) | |
| ER@0.25: 0.880 (S=28, C=1570, M=3216, R=991) | |
| ER@0.30: 0.888 (S=67, C=1531, M=3216, R=991) | |
| ER@0.35: 0.898 (S=117, C=1481, M=3216, R=991) | |
| ER@0.40: 0.917 (S=208, C=1390, M=3216, R=991) | |
| ER@0.45: 0.942 (S=327, C=1271, M=3216, R=991) | |
| ER@0.50: 0.974 (S=482, C=1116, M=3216, R=991) | |
| ER@0.55: 1.005 (S=632, C=966, M=3216, R=991) | |
| ER@0.60: 1.044 (S=820, C=778, M=3216, R=991) | |
| ER@0.65: 1.080 (S=994, C=604, M=3216, R=991) | |
| ER@0.70: 1.112 (S=1147, C=451, M=3216, R=991) | |
| ER@0.75: 1.139 (S=1278, C=320, M=3216, R=991) | |
| ER@0.80: 1.163 (S=1390, C=208, M=3216, R=991) | |
| ER@0.85: 1.180 (S=1473, C=125, M=3216, R=991) | |
| ER@0.90: 1.194 (S=1539, C=59, M=3216, R=991) | |
| ER@0.95: 1.201 (S=1576, C=22, M=3216, R=991) | |
| ER@1.00: 1.206 (S=1598, C=0, M=3216, R=991) | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.6681 | |
| redundant_rate: 0.3828 | |
| match_cost: 0.4448 | |
| semantic_score: 0.5959 | |
| jaccard_index: 0.2753 | |
| Bleu_1: 0.3391 | |
| Bleu_1_w: 0.0933 | |
| Bleu_2: 0.2263 | |
| Bleu_2_w: 0.0623 | |
| Bleu_3: 0.1656 | |
| Bleu_3_w: 0.0456 | |
| Bleu_4: 0.1275 | |
| Bleu_4_w: 0.0351 | |
| CIDEr: 0.6738 | |
| CIDEr_w: 0.1855 | |
| METEOR: 0.1632 | |
| METEOR_w: 0.0449 | |
| mean_error_rate: 1.0161 | |
| ER@0.05: 0.874 (S=0, C=1598, M=3216, R=991) | |
| ER@0.10: 0.874 (S=0, C=1598, M=3216, R=991) | |
| ER@0.15: 0.875 (S=3, C=1595, M=3216, R=991) | |
| ER@0.20: 0.877 (S=13, C=1585, M=3216, R=991) | |
| ER@0.25: 0.880 (S=28, C=1570, M=3216, R=991) | |
| ER@0.30: 0.888 (S=67, C=1531, M=3216, R=991) | |
| ER@0.35: 0.898 (S=117, C=1481, M=3216, R=991) | |
| ER@0.40: 0.917 (S=208, C=1390, M=3216, R=991) | |
| ER@0.45: 0.942 (S=327, C=1271, M=3216, R=991) | |
| ER@0.50: 0.974 (S=482, C=1116, M=3216, R=991) | |
| ER@0.55: 1.005 (S=632, C=966, M=3216, R=991) | |
| ER@0.60: 1.044 (S=820, C=778, M=3216, R=991) | |
| ER@0.65: 1.080 (S=994, C=604, M=3216, R=991) | |
| ER@0.70: 1.112 (S=1147, C=451, M=3216, R=991) | |
| ER@0.75: 1.139 (S=1278, C=320, M=3216, R=991) | |
| ER@0.80: 1.163 (S=1390, C=208, M=3216, R=991) | |
| ER@0.85: 1.180 (S=1473, C=125, M=3216, R=991) | |
| ER@0.90: 1.194 (S=1539, C=59, M=3216, R=991) | |
| ER@0.95: 1.201 (S=1576, C=22, M=3216, R=991) | |
| ER@1.00: 1.206 (S=1598, C=0, M=3216, R=991) | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.6681 | |
| redundant_rate: 0.3828 | |
| match_cost: 0.4448 | |
| semantic_score: 0.5959 | |
| jaccard_index: 0.2753 | |
| Bleu_1: 0.3391 | |
| Bleu_1_w: 0.0933 | |
| Bleu_2: 0.2263 | |
| Bleu_2_w: 0.0623 | |
| Bleu_3: 0.1656 | |
| Bleu_3_w: 0.0456 | |
| Bleu_4: 0.1275 | |
| Bleu_4_w: 0.0351 | |
| CIDEr: 0.6738 | |
| CIDEr_w: 0.1855 | |
| METEOR: 0.1632 | |
| METEOR_w: 0.0449 | |
| mean_error_rate: 1.0161 | |
| Evaluation datasets: | |
| * holoassist/dialog_val | num samples: 291 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.3 | |
| ER@0.05: 0.607 (S=9, C=6349, M=8903, R=348) | |
| ER@0.10: 0.607 (S=15, C=6343, M=8903, R=348) | |
| ER@0.15: 0.609 (S=46, C=6312, M=8903, R=348) | |
| ER@0.20: 0.613 (S=109, C=6249, M=8903, R=348) | |
| ER@0.25: 0.622 (S=245, C=6113, M=8903, R=348) | |
| ER@0.30: 0.634 (S=427, C=5931, M=8903, R=348) | |
| ER@0.35: 0.652 (S=703, C=5655, M=8903, R=348) | |
| ER@0.40: 0.674 (S=1031, C=5327, M=8903, R=348) | |
| ER@0.45: 0.703 (S=1475, C=4883, M=8903, R=348) | |
| ER@0.50: 0.735 (S=1963, C=4395, M=8903, R=348) | |
| ER@0.55: 0.773 (S=2547, C=3811, M=8903, R=348) | |
| ER@0.60: 0.811 (S=3119, C=3239, M=8903, R=348) | |
| ER@0.65: 0.851 (S=3741, C=2617, M=8903, R=348) | |
| ER@0.70: 0.893 (S=4379, C=1979, M=8903, R=348) | |
| ER@0.75: 0.932 (S=4970, C=1388, M=8903, R=348) | |
| ER@0.80: 0.965 (S=5473, C=885, M=8903, R=348) | |
| ER@0.85: 0.989 (S=5845, C=513, M=8903, R=348) | |
| ER@0.90: 1.008 (S=6125, C=233, M=8903, R=348) | |
| ER@0.95: 1.019 (S=6295, C=63, M=8903, R=348) | |
| ER@1.00: 1.023 (S=6354, C=4, M=8903, R=348) | |
| Evalulation: holoassist-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.5834 | |
| redundant_rate: 0.0519 | |
| match_cost: 0.4314 | |
| semantic_score: 0.5936 | |
| jaccard_index: 0.4073 | |
| Bleu_1: 0.3692 | |
| Bleu_1_w: 0.1504 | |
| Bleu_2: 0.2517 | |
| Bleu_2_w: 0.1025 | |
| Bleu_3: 0.1857 | |
| Bleu_3_w: 0.0757 | |
| Bleu_4: 0.1423 | |
| Bleu_4_w: 0.0580 | |
| CIDEr: 0.7988 | |
| CIDEr_w: 0.3254 | |
| METEOR: 0.1755 | |
| METEOR_w: 0.0715 | |
| mean_error_rate: 0.7860 | |
| Evaluation datasets: | |
| * epickitchens/dialog_val | num samples: 150 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.3 | |
| ER@0.05: 0.960 (S=1, C=3882, M=2549, R=3624) | |
| ER@0.10: 0.962 (S=14, C=3869, M=2549, R=3624) | |
| ER@0.15: 0.967 (S=44, C=3839, M=2549, R=3624) | |
| ER@0.20: 0.977 (S=108, C=3775, M=2549, R=3624) | |
| ER@0.25: 0.992 (S=205, C=3678, M=2549, R=3624) | |
| ER@0.30: 1.021 (S=395, C=3488, M=2549, R=3624) | |
| ER@0.35: 1.063 (S=667, C=3216, M=2549, R=3624) | |
| ER@0.40: 1.112 (S=982, C=2901, M=2549, R=3624) | |
| ER@0.45: 1.174 (S=1376, C=2507, M=2549, R=3624) | |
| ER@0.50: 1.237 (S=1785, C=2098, M=2549, R=3624) | |
| ER@0.55: 1.302 (S=2204, C=1679, M=2549, R=3624) | |
| ER@0.60: 1.358 (S=2563, C=1320, M=2549, R=3624) | |
| ER@0.65: 1.412 (S=2908, C=975, M=2549, R=3624) | |
| ER@0.70: 1.463 (S=3236, C=647, M=2549, R=3624) | |
| ER@0.75: 1.497 (S=3458, C=425, M=2549, R=3624) | |
| ER@0.80: 1.526 (S=3643, C=240, M=2549, R=3624) | |
| ER@0.85: 1.545 (S=3762, C=121, M=2549, R=3624) | |
| ER@0.90: 1.555 (S=3828, C=55, M=2549, R=3624) | |
| ER@0.95: 1.560 (S=3860, C=23, M=2549, R=3624) | |
| ER@1.00: 1.563 (S=3883, C=0, M=2549, R=3624) | |
| Evalulation: epickitchens-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.3963 | |
| redundant_rate: 0.4827 | |
| match_cost: 0.5297 | |
| semantic_score: 0.5236 | |
| jaccard_index: 0.3861 | |
| Bleu_1: 0.3089 | |
| Bleu_1_w: 0.1193 | |
| Bleu_2: 0.1870 | |
| Bleu_2_w: 0.0722 | |
| Bleu_3: 0.1258 | |
| Bleu_3_w: 0.0486 | |
| Bleu_4: 0.0911 | |
| Bleu_4_w: 0.0352 | |
| CIDEr: 0.5723 | |
| CIDEr_w: 0.2210 | |
| METEOR: 0.1391 | |
| METEOR_w: 0.0537 | |
| mean_error_rate: 1.2623 | |
| Evaluation datasets: | |
| * egoexolearn/dialog_val | num samples: 123 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.3 | |
| ER@0.05: 0.844 (S=0, C=2590, M=9401, R=720) | |
| ER@0.10: 0.844 (S=2, C=2588, M=9401, R=720) | |
| ER@0.15: 0.845 (S=13, C=2577, M=9401, R=720) | |
| ER@0.20: 0.846 (S=25, C=2565, M=9401, R=720) | |
| ER@0.25: 0.848 (S=51, C=2539, M=9401, R=720) | |
| ER@0.30: 0.852 (S=98, C=2492, M=9401, R=720) | |
| ER@0.35: 0.859 (S=183, C=2407, M=9401, R=720) | |
| ER@0.40: 0.870 (S=308, C=2282, M=9401, R=720) | |
| ER@0.45: 0.885 (S=493, C=2097, M=9401, R=720) | |
| ER@0.50: 0.903 (S=709, C=1881, M=9401, R=720) | |
| ER@0.55: 0.922 (S=931, C=1659, M=9401, R=720) | |
| ER@0.60: 0.944 (S=1197, C=1393, M=9401, R=720) | |
| ER@0.65: 0.968 (S=1481, C=1109, M=9401, R=720) | |
| ER@0.70: 0.989 (S=1741, C=849, M=9401, R=720) | |
| ER@0.75: 1.012 (S=2013, C=577, M=9401, R=720) | |
| ER@0.80: 1.028 (S=2200, C=390, M=9401, R=720) | |
| ER@0.85: 1.041 (S=2363, C=227, M=9401, R=720) | |
| ER@0.90: 1.052 (S=2495, C=95, M=9401, R=720) | |
| ER@0.95: 1.057 (S=2554, C=36, M=9401, R=720) | |
| ER@1.00: 1.060 (S=2590, C=0, M=9401, R=720) | |
| Evalulation: egoexolearn-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.7840 | |
| redundant_rate: 0.2175 | |
| match_cost: 0.4205 | |
| semantic_score: 0.6107 | |
| jaccard_index: 0.2038 | |
| Bleu_1: 0.3775 | |
| Bleu_1_w: 0.0769 | |
| Bleu_2: 0.2554 | |
| Bleu_2_w: 0.0520 | |
| Bleu_3: 0.1866 | |
| Bleu_3_w: 0.0380 | |
| Bleu_4: 0.1422 | |
| Bleu_4_w: 0.0290 | |
| CIDEr: 0.7699 | |
| CIDEr_w: 0.1569 | |
| METEOR: 0.1708 | |
| METEOR_w: 0.0348 | |
| mean_error_rate: 0.9335 | |
| Evaluation datasets: | |
| * wtag/dialog_val | num samples: 21 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.3 | |
| ER@0.05: 0.738 (S=6, C=622, M=445, R=341) | |
| ER@0.10: 0.752 (S=21, C=607, M=445, R=341) | |
| ER@0.15: 0.781 (S=52, C=576, M=445, R=341) | |
| ER@0.20: 0.811 (S=84, C=544, M=445, R=341) | |
| ER@0.25: 0.851 (S=127, C=501, M=445, R=341) | |
| ER@0.30: 0.882 (S=160, C=468, M=445, R=341) | |
| ER@0.35: 0.921 (S=202, C=426, M=445, R=341) | |
| ER@0.40: 0.951 (S=234, C=394, M=445, R=341) | |
| ER@0.45: 0.993 (S=279, C=349, M=445, R=341) | |
| ER@0.50: 1.052 (S=343, C=285, M=445, R=341) | |
| ER@0.55: 1.096 (S=390, C=238, M=445, R=341) | |
| ER@0.60: 1.129 (S=425, C=203, M=445, R=341) | |
| ER@0.65: 1.176 (S=476, C=152, M=445, R=341) | |
| ER@0.70: 1.212 (S=514, C=114, M=445, R=341) | |
| ER@0.75: 1.252 (S=557, C=71, M=445, R=341) | |
| ER@0.80: 1.273 (S=580, C=48, M=445, R=341) | |
| ER@0.85: 1.287 (S=595, C=33, M=445, R=341) | |
| ER@0.90: 1.302 (S=611, C=17, M=445, R=341) | |
| ER@0.95: 1.312 (S=622, C=6, M=445, R=341) | |
| ER@1.00: 1.318 (S=628, C=0, M=445, R=341) | |
| Evalulation: wtag-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.4147 | |
| redundant_rate: 0.3519 | |
| match_cost: 0.5530 | |
| semantic_score: 0.4763 | |
| jaccard_index: 0.4441 | |
| Bleu_1: 0.2360 | |
| Bleu_1_w: 0.1048 | |
| Bleu_2: 0.1536 | |
| Bleu_2_w: 0.0682 | |
| Bleu_3: 0.1080 | |
| Bleu_3_w: 0.0480 | |
| Bleu_4: 0.0791 | |
| Bleu_4_w: 0.0351 | |
| CIDEr: 0.4533 | |
| CIDEr_w: 0.2013 | |
| METEOR: 0.1613 | |
| METEOR_w: 0.0717 | |
| mean_error_rate: 1.0543 | |
| Evaluation datasets: | |
| * assembly101/dialog_val | num samples: 336 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.3 | |
| ER@0.05: 0.667 (S=2, C=3483, M=4833, R=711) | |
| ER@0.10: 0.668 (S=9, C=3476, M=4833, R=711) | |
| ER@0.15: 0.670 (S=26, C=3459, M=4833, R=711) | |
| ER@0.20: 0.673 (S=53, C=3432, M=4833, R=711) | |
| ER@0.25: 0.679 (S=106, C=3379, M=4833, R=711) | |
| ER@0.30: 0.688 (S=179, C=3306, M=4833, R=711) | |
| ER@0.35: 0.707 (S=337, C=3148, M=4833, R=711) | |
| ER@0.40: 0.733 (S=556, C=2929, M=4833, R=711) | |
| ER@0.45: 0.767 (S=832, C=2653, M=4833, R=711) | |
| ER@0.50: 0.806 (S=1159, C=2326, M=4833, R=711) | |
| ER@0.55: 0.851 (S=1533, C=1952, M=4833, R=711) | |
| ER@0.60: 0.895 (S=1897, C=1588, M=4833, R=711) | |
| ER@0.65: 0.937 (S=2251, C=1234, M=4833, R=711) | |
| ER@0.70: 0.966 (S=2495, C=990, M=4833, R=711) | |
| ER@0.75: 1.000 (S=2770, C=715, M=4833, R=711) | |
| ER@0.80: 1.027 (S=2999, C=486, M=4833, R=711) | |
| ER@0.85: 1.049 (S=3183, C=302, M=4833, R=711) | |
| ER@0.90: 1.069 (S=3348, C=137, M=4833, R=711) | |
| ER@0.95: 1.079 (S=3429, C=56, M=4833, R=711) | |
| ER@1.00: 1.085 (S=3482, C=3, M=4833, R=711) | |
| Evalulation: assembly101-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.5810 | |
| redundant_rate: 0.1694 | |
| match_cost: 0.4482 | |
| semantic_score: 0.5854 | |
| jaccard_index: 0.3860 | |
| Bleu_1: 0.3927 | |
| Bleu_1_w: 0.1516 | |
| Bleu_2: 0.2798 | |
| Bleu_2_w: 0.1080 | |
| Bleu_3: 0.2103 | |
| Bleu_3_w: 0.0812 | |
| Bleu_4: 0.1654 | |
| Bleu_4_w: 0.0638 | |
| CIDEr: 0.7986 | |
| CIDEr_w: 0.3083 | |
| METEOR: 0.1856 | |
| METEOR_w: 0.0716 | |
| mean_error_rate: 0.8507 | |
| All Finished! Time: 0.50 minutes | |
| File "/opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/mmassist/eval/eval.py", line 118 | |
| print(f"Runs:\n{'\n'.join(eval_args.inference_setups.split(','))}") | |
| ^ | |
| SyntaxError: f-string expression part cannot include a backslash | |
| File "/opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/mmassist/eval/eval.py", line 118 | |
| print(f"Runs:\n{'\n'.join(eval_args.inference_setups.split(','))}") | |
| ^ | |
| SyntaxError: f-string expression part cannot include a backslash | |
| Model: /fsx_0/user/imzyc/proact_exps/20240821-L4096-I1-ep4-NOSEP-nr0.1-klgmix-1s-lora-bs256 | |
| {'assembly101/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.35}]}, | |
| 'ego4d/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.35}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.35}]}, | |
| 'egoexolearn/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.35}]}, | |
| 'epickitchens/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.35}]}, | |
| 'holoassist/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.35}]}, | |
| 'wtag/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.35}]}} | |
| Evaluation datasets: | |
| * ego4d/dialog_val | num samples: 96 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.35 | |
| ER@0.05: 0.970 (S=0, C=1782, M=3032, R=1636) | |
| ER@0.10: 0.970 (S=0, C=1782, M=3032, R=1636) | |
| ER@0.15: 0.971 (S=5, C=1777, M=3032, R=1636) | |
| ER@0.20: 0.974 (S=19, C=1763, M=3032, R=1636) | |
| ER@0.25: 0.977 (S=37, C=1745, M=3032, R=1636) | |
| ER@0.30: 0.988 (S=87, C=1695, M=3032, R=1636) | |
| ER@0.35: 1.002 (S=156, C=1626, M=3032, R=1636) | |
| ER@0.40: 1.023 (S=255, C=1527, M=3032, R=1636) | |
| ER@0.45: 1.050 (S=386, C=1396, M=3032, R=1636) | |
| ER@0.50: 1.086 (S=559, C=1223, M=3032, R=1636) | |
| ER@0.55: 1.120 (S=723, C=1059, M=3032, R=1636) | |
| ER@0.60: 1.158 (S=905, C=877, M=3032, R=1636) | |
| ER@0.65: 1.201 (S=1113, C=669, M=3032, R=1636) | |
| ER@0.70: 1.240 (S=1300, C=482, M=3032, R=1636) | |
| ER@0.75: 1.274 (S=1467, C=315, M=3032, R=1636) | |
| ER@0.80: 1.298 (S=1580, C=202, M=3032, R=1636) | |
| ER@0.85: 1.317 (S=1672, C=110, M=3032, R=1636) | |
| ER@0.90: 1.328 (S=1726, C=56, M=3032, R=1636) | |
| ER@0.95: 1.336 (S=1764, C=18, M=3032, R=1636) | |
| ER@1.00: 1.340 (S=1782, C=0, M=3032, R=1636) | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.35-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.6298 | |
| redundant_rate: 0.4786 | |
| match_cost: 0.4571 | |
| semantic_score: 0.5891 | |
| jaccard_index: 0.2763 | |
| Bleu_1: 0.3298 | |
| Bleu_1_w: 0.0911 | |
| Bleu_2: 0.2159 | |
| Bleu_2_w: 0.0597 | |
| Bleu_3: 0.1562 | |
| Bleu_3_w: 0.0431 | |
| Bleu_4: 0.1194 | |
| Bleu_4_w: 0.0330 | |
| CIDEr: 0.6364 | |
| CIDEr_w: 0.1758 | |
| METEOR: 0.1607 | |
| METEOR_w: 0.0444 | |
| mean_error_rate: 1.1310 | |
| ER@0.05: 0.970 (S=0, C=1782, M=3032, R=1636) | |
| ER@0.10: 0.970 (S=0, C=1782, M=3032, R=1636) | |
| ER@0.15: 0.971 (S=5, C=1777, M=3032, R=1636) | |
| ER@0.20: 0.974 (S=19, C=1763, M=3032, R=1636) | |
| ER@0.25: 0.977 (S=37, C=1745, M=3032, R=1636) | |
| ER@0.30: 0.988 (S=87, C=1695, M=3032, R=1636) | |
| ER@0.35: 1.002 (S=156, C=1626, M=3032, R=1636) | |
| ER@0.40: 1.023 (S=255, C=1527, M=3032, R=1636) | |
| ER@0.45: 1.050 (S=386, C=1396, M=3032, R=1636) | |
| ER@0.50: 1.086 (S=559, C=1223, M=3032, R=1636) | |
| ER@0.55: 1.120 (S=723, C=1059, M=3032, R=1636) | |
| ER@0.60: 1.158 (S=905, C=877, M=3032, R=1636) | |
| ER@0.65: 1.201 (S=1113, C=669, M=3032, R=1636) | |
| ER@0.70: 1.240 (S=1300, C=482, M=3032, R=1636) | |
| ER@0.75: 1.274 (S=1467, C=315, M=3032, R=1636) | |
| ER@0.80: 1.298 (S=1580, C=202, M=3032, R=1636) | |
| ER@0.85: 1.317 (S=1672, C=110, M=3032, R=1636) | |
| ER@0.90: 1.328 (S=1726, C=56, M=3032, R=1636) | |
| ER@0.95: 1.336 (S=1764, C=18, M=3032, R=1636) | |
| ER@1.00: 1.340 (S=1782, C=0, M=3032, R=1636) | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.35-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.6298 | |
| redundant_rate: 0.4786 | |
| match_cost: 0.4571 | |
| semantic_score: 0.5891 | |
| jaccard_index: 0.2763 | |
| Bleu_1: 0.3298 | |
| Bleu_1_w: 0.0911 | |
| Bleu_2: 0.2159 | |
| Bleu_2_w: 0.0597 | |
| Bleu_3: 0.1562 | |
| Bleu_3_w: 0.0431 | |
| Bleu_4: 0.1194 | |
| Bleu_4_w: 0.0330 | |
| CIDEr: 0.6364 | |
| CIDEr_w: 0.1758 | |
| METEOR: 0.1607 | |
| METEOR_w: 0.0444 | |
| mean_error_rate: 1.1310 | |
| Evaluation datasets: | |
| * holoassist/dialog_val | num samples: 291 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.35 | |
| ER@0.05: 0.641 (S=10, C=7813, M=7438, R=2337) | |
| ER@0.10: 0.642 (S=29, C=7794, M=7438, R=2337) | |
| ER@0.15: 0.645 (S=70, C=7753, M=7438, R=2337) | |
| ER@0.20: 0.652 (S=173, C=7650, M=7438, R=2337) | |
| ER@0.25: 0.667 (S=401, C=7422, M=7438, R=2337) | |
| ER@0.30: 0.685 (S=679, C=7144, M=7438, R=2337) | |
| ER@0.35: 0.712 (S=1089, C=6734, M=7438, R=2337) | |
| ER@0.40: 0.740 (S=1518, C=6305, M=7438, R=2337) | |
| ER@0.45: 0.776 (S=2060, C=5763, M=7438, R=2337) | |
| ER@0.50: 0.819 (S=2723, C=5100, M=7438, R=2337) | |
| ER@0.55: 0.863 (S=3388, C=4435, M=7438, R=2337) | |
| ER@0.60: 0.912 (S=4143, C=3680, M=7438, R=2337) | |
| ER@0.65: 0.962 (S=4909, C=2914, M=7438, R=2337) | |
| ER@0.70: 1.011 (S=5658, C=2165, M=7438, R=2337) | |
| ER@0.75: 1.055 (S=6321, C=1502, M=7438, R=2337) | |
| ER@0.80: 1.090 (S=6864, C=959, M=7438, R=2337) | |
| ER@0.85: 1.118 (S=7292, C=531, M=7438, R=2337) | |
| ER@0.90: 1.138 (S=7590, C=233, M=7438, R=2337) | |
| ER@0.95: 1.149 (S=7757, C=66, M=7438, R=2337) | |
| ER@1.00: 1.153 (S=7816, C=7, M=7438, R=2337) | |
| Evalulation: holoassist-dialog_val_L0_I1/stream/notalk0.35-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.4874 | |
| redundant_rate: 0.2300 | |
| match_cost: 0.4575 | |
| semantic_score: 0.5744 | |
| jaccard_index: 0.4445 | |
| Bleu_1: 0.3553 | |
| Bleu_1_w: 0.1579 | |
| Bleu_2: 0.2373 | |
| Bleu_2_w: 0.1055 | |
| Bleu_3: 0.1731 | |
| Bleu_3_w: 0.0769 | |
| Bleu_4: 0.1312 | |
| Bleu_4_w: 0.0583 | |
| CIDEr: 0.7173 | |
| CIDEr_w: 0.3189 | |
| METEOR: 0.1641 | |
| METEOR_w: 0.0729 | |
| mean_error_rate: 0.8715 | |
| Evaluation datasets: | |
| * epickitchens/dialog_val | num samples: 150 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.35 | |
| ER@0.05: 1.405 (S=3, C=4273, M=2156, R=6878) | |
| ER@0.10: 1.407 (S=18, C=4258, M=2156, R=6878) | |
| ER@0.15: 1.411 (S=40, C=4236, M=2156, R=6878) | |
| ER@0.20: 1.420 (S=101, C=4175, M=2156, R=6878) | |
| ER@0.25: 1.439 (S=224, C=4052, M=2156, R=6878) | |
| ER@0.30: 1.468 (S=407, C=3869, M=2156, R=6878) | |
| ER@0.35: 1.510 (S=680, C=3596, M=2156, R=6878) | |
| ER@0.40: 1.569 (S=1059, C=3217, M=2156, R=6878) | |
| ER@0.45: 1.632 (S=1460, C=2816, M=2156, R=6878) | |
| ER@0.50: 1.703 (S=1920, C=2356, M=2156, R=6878) | |
| ER@0.55: 1.777 (S=2394, C=1882, M=2156, R=6878) | |
| ER@0.60: 1.845 (S=2833, C=1443, M=2156, R=6878) | |
| ER@0.65: 1.907 (S=3233, C=1043, M=2156, R=6878) | |
| ER@0.70: 1.958 (S=3563, C=713, M=2156, R=6878) | |
| ER@0.75: 1.997 (S=3812, C=464, M=2156, R=6878) | |
| ER@0.80: 2.025 (S=3993, C=283, M=2156, R=6878) | |
| ER@0.85: 2.045 (S=4121, C=155, M=2156, R=6878) | |
| ER@0.90: 2.059 (S=4212, C=64, M=2156, R=6878) | |
| ER@0.95: 2.067 (S=4259, C=17, M=2156, R=6878) | |
| ER@1.00: 2.069 (S=4274, C=2, M=2156, R=6878) | |
| Evalulation: epickitchens-dialog_val_L0_I1/stream/notalk0.35-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.3352 | |
| redundant_rate: 0.6166 | |
| match_cost: 0.5227 | |
| semantic_score: 0.5264 | |
| jaccard_index: 0.3213 | |
| Bleu_1: 0.2964 | |
| Bleu_1_w: 0.0952 | |
| Bleu_2: 0.1765 | |
| Bleu_2_w: 0.0567 | |
| Bleu_3: 0.1170 | |
| Bleu_3_w: 0.0376 | |
| Bleu_4: 0.0833 | |
| Bleu_4_w: 0.0268 | |
| CIDEr: 0.5245 | |
| CIDEr_w: 0.1685 | |
| METEOR: 0.1381 | |
| METEOR_w: 0.0444 | |
| mean_error_rate: 1.7357 | |
| Evaluation datasets: | |
| * egoexolearn/dialog_val | num samples: 123 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.35 | |
| ER@0.05: 0.909 (S=1, C=3265, M=8725, R=2174) | |
| ER@0.10: 0.909 (S=4, C=3262, M=8725, R=2174) | |
| ER@0.15: 0.910 (S=16, C=3250, M=8725, R=2174) | |
| ER@0.20: 0.912 (S=32, C=3234, M=8725, R=2174) | |
| ER@0.25: 0.915 (S=76, C=3190, M=8725, R=2174) | |
| ER@0.30: 0.921 (S=145, C=3121, M=8725, R=2174) | |
| ER@0.35: 0.933 (S=288, C=2978, M=8725, R=2174) | |
| ER@0.40: 0.949 (S=479, C=2787, M=8725, R=2174) | |
| ER@0.45: 0.968 (S=708, C=2558, M=8725, R=2174) | |
| ER@0.50: 0.996 (S=1047, C=2219, M=8725, R=2174) | |
| ER@0.55: 1.025 (S=1392, C=1874, M=8725, R=2174) | |
| ER@0.60: 1.051 (S=1702, C=1564, M=8725, R=2174) | |
| ER@0.65: 1.078 (S=2023, C=1243, M=8725, R=2174) | |
| ER@0.70: 1.105 (S=2351, C=915, M=8725, R=2174) | |
| ER@0.75: 1.129 (S=2635, C=631, M=8725, R=2174) | |
| ER@0.80: 1.148 (S=2863, C=403, M=8725, R=2174) | |
| ER@0.85: 1.163 (S=3042, C=224, M=8725, R=2174) | |
| ER@0.90: 1.174 (S=3181, C=85, M=8725, R=2174) | |
| ER@0.95: 1.179 (S=3237, C=29, M=8725, R=2174) | |
| ER@1.00: 1.181 (S=3266, C=0, M=8725, R=2174) | |
| Evalulation: egoexolearn-dialog_val_L0_I1/stream/notalk0.35-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.7276 | |
| redundant_rate: 0.3996 | |
| match_cost: 0.4578 | |
| semantic_score: 0.5889 | |
| jaccard_index: 0.2306 | |
| Bleu_1: 0.3660 | |
| Bleu_1_w: 0.0844 | |
| Bleu_2: 0.2423 | |
| Bleu_2_w: 0.0559 | |
| Bleu_3: 0.1734 | |
| Bleu_3_w: 0.0400 | |
| Bleu_4: 0.1299 | |
| Bleu_4_w: 0.0299 | |
| CIDEr: 0.6624 | |
| CIDEr_w: 0.1527 | |
| METEOR: 0.1616 | |
| METEOR_w: 0.0373 | |
| mean_error_rate: 1.0277 | |
| Evaluation datasets: | |
| * wtag/dialog_val | num samples: 21 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.35 | |
| ER@0.05: 0.940 (S=12, C=661, M=400, R=597) | |
| ER@0.10: 0.952 (S=24, C=649, M=400, R=597) | |
| ER@0.15: 0.974 (S=48, C=625, M=400, R=597) | |
| ER@0.20: 0.998 (S=74, C=599, M=400, R=597) | |
| ER@0.25: 1.034 (S=112, C=561, M=400, R=597) | |
| ER@0.30: 1.075 (S=156, C=517, M=400, R=597) | |
| ER@0.35: 1.109 (S=193, C=480, M=400, R=597) | |
| ER@0.40: 1.151 (S=238, C=435, M=400, R=597) | |
| ER@0.45: 1.190 (S=280, C=393, M=400, R=597) | |
| ER@0.50: 1.256 (S=351, C=322, M=400, R=597) | |
| ER@0.55: 1.310 (S=409, C=264, M=400, R=597) | |
| ER@0.60: 1.354 (S=456, C=217, M=400, R=597) | |
| ER@0.65: 1.409 (S=515, C=158, M=400, R=597) | |
| ER@0.70: 1.452 (S=561, C=112, M=400, R=597) | |
| ER@0.75: 1.497 (S=609, C=64, M=400, R=597) | |
| ER@0.80: 1.515 (S=629, C=44, M=400, R=597) | |
| ER@0.85: 1.527 (S=641, C=32, M=400, R=597) | |
| ER@0.90: 1.544 (S=660, C=13, M=400, R=597) | |
| ER@0.95: 1.555 (S=671, C=2, M=400, R=597) | |
| ER@1.00: 1.556 (S=673, C=0, M=400, R=597) | |
| Evalulation: wtag-dialog_val_L0_I1/stream/notalk0.35-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.3728 | |
| redundant_rate: 0.4701 | |
| match_cost: 0.5487 | |
| semantic_score: 0.4821 | |
| jaccard_index: 0.4030 | |
| Bleu_1: 0.2399 | |
| Bleu_1_w: 0.0967 | |
| Bleu_2: 0.1548 | |
| Bleu_2_w: 0.0624 | |
| Bleu_3: 0.1089 | |
| Bleu_3_w: 0.0439 | |
| Bleu_4: 0.0791 | |
| Bleu_4_w: 0.0319 | |
| CIDEr: 0.4224 | |
| CIDEr_w: 0.1702 | |
| METEOR: 0.1591 | |
| METEOR_w: 0.0641 | |
| mean_error_rate: 1.2699 | |
| Evaluation datasets: | |
| * assembly101/dialog_val | num samples: 336 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.35 | |
| ER@0.05: 0.700 (S=5, C=4658, M=3655, R=2163) | |
| ER@0.10: 0.701 (S=12, C=4651, M=3655, R=2163) | |
| ER@0.15: 0.703 (S=33, C=4630, M=3655, R=2163) | |
| ER@0.20: 0.708 (S=75, C=4588, M=3655, R=2163) | |
| ER@0.25: 0.717 (S=146, C=4517, M=3655, R=2163) | |
| ER@0.30: 0.734 (S=287, C=4376, M=3655, R=2163) | |
| ER@0.35: 0.761 (S=511, C=4152, M=3655, R=2163) | |
| ER@0.40: 0.800 (S=839, C=3824, M=3655, R=2163) | |
| ER@0.45: 0.846 (S=1217, C=3446, M=3655, R=2163) | |
| ER@0.50: 0.907 (S=1730, C=2933, M=3655, R=2163) | |
| ER@0.55: 0.975 (S=2289, C=2374, M=3655, R=2163) | |
| ER@0.60: 1.034 (S=2783, C=1880, M=3655, R=2163) | |
| ER@0.65: 1.090 (S=3252, C=1411, M=3655, R=2163) | |
| ER@0.70: 1.130 (S=3579, C=1084, M=3655, R=2163) | |
| ER@0.75: 1.169 (S=3906, C=757, M=3655, R=2163) | |
| ER@0.80: 1.198 (S=4149, C=514, M=3655, R=2163) | |
| ER@0.85: 1.221 (S=4335, C=328, M=3655, R=2163) | |
| ER@0.90: 1.242 (S=4512, C=151, M=3655, R=2163) | |
| ER@0.95: 1.253 (S=4608, C=55, M=3655, R=2163) | |
| ER@1.00: 1.259 (S=4658, C=5, M=3655, R=2163) | |
| Evalulation: assembly101-dialog_val_L0_I1/stream/notalk0.35-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.4394 | |
| redundant_rate: 0.3169 | |
| match_cost: 0.4822 | |
| semantic_score: 0.5648 | |
| jaccard_index: 0.4449 | |
| Bleu_1: 0.3689 | |
| Bleu_1_w: 0.1641 | |
| Bleu_2: 0.2544 | |
| Bleu_2_w: 0.1132 | |
| Bleu_3: 0.1867 | |
| Bleu_3_w: 0.0830 | |
| Bleu_4: 0.1441 | |
| Bleu_4_w: 0.0641 | |
| CIDEr: 0.6552 | |
| CIDEr_w: 0.2915 | |
| METEOR: 0.1687 | |
| METEOR_w: 0.0751 | |
| mean_error_rate: 0.9575 | |
| All Finished! Time: 44.05 minutes | |
| Model: /fsx_0/user/imzyc/proact_exps/20240821-L4096-I1-ep4-NOSEP-nr0.1-klgmix-1s-lora-bs256 | |
| Runs: | |
| ego4d/dialog_val_L0_I1|stream|4k|0.35|summarize_and_drop | |
| ego4d/dialog_val_L0_I1|stream|4k|0.35|summarize_and_drop | |
| holoassist/dialog_val_L0_I1|stream|4k|0.35|summarize_and_drop | |
| epickitchens/dialog_val_L0_I1|stream|4k|0.35|summarize_and_drop | |
| egoexolearn/dialog_val_L0_I1|stream|4k|0.35|summarize_and_drop | |
| wtag/dialog_val_L0_I1|stream|4k|0.35|summarize_and_drop | |
| assembly101/dialog_val_L0_I1|stream|4k|0.35|summarize_and_drop | |
| scripts/eval/Aug_eval_stream.sh: 75: des: not found | |
| Model: /fsx_0/user/imzyc/proact_exps/20240821-L4096-I1-ep4-NOSEP-nr0.1-klgmix-1s-lora-bs256 | |
| {'assembly101/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.1}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}]}, | |
| 'ego4d/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.05}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.1}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}]}, | |
| 'egoexolearn/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.1}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}]}, | |
| 'epickitchens/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.1}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}]}, | |
| 'holoassist/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.1}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}]}, | |
| 'wtag/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.1}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}]}} | |
| Evaluation datasets: | |
| * ego4d/dialog_val | num samples: 96 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.05 | |
| ER@0.50: 0.867 (S=167, C=669, M=3978, R=30) | |
| ER@0.55: 0.878 (S=217, C=619, M=3978, R=30) | |
| ER@0.60: 0.896 (S=307, C=529, M=3978, R=30) | |
| ER@0.65: 0.916 (S=403, C=433, M=3978, R=30) | |
| ER@0.70: 0.932 (S=481, C=355, M=3978, R=30) | |
| ER@0.75: 0.957 (S=601, C=235, M=3978, R=30) | |
| ER@0.80: 0.972 (S=671, C=165, M=3978, R=30) | |
| ER@0.85: 0.985 (S=734, C=102, M=3978, R=30) | |
| ER@0.90: 0.996 (S=786, C=50, M=3978, R=30) | |
| ER@0.95: 1.003 (S=820, C=16, M=3978, R=30) | |
| ER@1.00: 1.006 (S=836, C=0, M=3978, R=30) | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.05-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.8263 | |
| redundant_rate: 0.0346 | |
| match_cost: 0.3603 | |
| semantic_score: 0.6504 | |
| mean_error_rate: 0.9463 | |
| mean_error_rate_v2: 0.9405 | |
| jaccard_index: 0.1726 | |
| jaccard_index_v2: 0.0595 | |
| AP: 0.3331 | |
| AR: 0.0599 | |
| Avg-F1: 0.1016 | |
| num_matched: 836.0000 | |
| num_missed: 3978.0000 | |
| num_redundant: 30.0000 | |
| num_correct_5: 669.0000 | |
| Bleu_1: 0.4009 | |
| Bleu_1_w: 0.0692 | |
| Bleu_2: 0.2932 | |
| Bleu_2_w: 0.0506 | |
| Bleu_3: 0.2283 | |
| Bleu_3_w: 0.0394 | |
| Bleu_4: 0.1844 | |
| Bleu_4_w: 0.0318 | |
| CIDEr: 1.1238 | |
| CIDEr_w: 0.1940 | |
| METEOR: 0.2092 | |
| METEOR_w: 0.0361 | |
| Updating eval setup: not_talk_threshold: 0.05 -> 0.1 | |
| ER@0.50: 0.874 (S=193, C=711, M=3910, R=104) | |
| ER@0.55: 0.886 (S=251, C=653, M=3910, R=104) | |
| ER@0.60: 0.905 (S=343, C=561, M=3910, R=104) | |
| ER@0.65: 0.925 (S=437, C=467, M=3910, R=104) | |
| ER@0.70: 0.944 (S=532, C=372, M=3910, R=104) | |
| ER@0.75: 0.970 (S=656, C=248, M=3910, R=104) | |
| ER@0.80: 0.985 (S=730, C=174, M=3910, R=104) | |
| ER@0.85: 1.000 (S=802, C=102, M=3910, R=104) | |
| ER@0.90: 1.011 (S=853, C=51, M=3910, R=104) | |
| ER@0.95: 1.018 (S=888, C=16, M=3910, R=104) | |
| ER@1.00: 1.022 (S=904, C=0, M=3910, R=104) | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.1-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.8122 | |
| redundant_rate: 0.1032 | |
| match_cost: 0.3671 | |
| semantic_score: 0.6444 | |
| mean_error_rate: 0.9582 | |
| mean_error_rate_v2: 0.9380 | |
| jaccard_index: 0.1838 | |
| jaccard_index_v2: 0.0620 | |
| AP: 0.3026 | |
| AR: 0.0634 | |
| Avg-F1: 0.1048 | |
| num_matched: 904.0000 | |
| num_missed: 3910.0000 | |
| num_redundant: 104.0000 | |
| num_correct_5: 711.0000 | |
| Bleu_1: 0.3992 | |
| Bleu_1_w: 0.0734 | |
| Bleu_2: 0.2919 | |
| Bleu_2_w: 0.0537 | |
| Bleu_3: 0.2278 | |
| Bleu_3_w: 0.0419 | |
| Bleu_4: 0.1841 | |
| Bleu_4_w: 0.0338 | |
| CIDEr: 1.1193 | |
| CIDEr_w: 0.2058 | |
| METEOR: 0.2092 | |
| METEOR_w: 0.0385 | |
| Updating eval setup: not_talk_threshold: 0.1 -> 0.2 | |
| ER@0.50: 0.877 (S=324, C=938, M=3552, R=344) | |
| ER@0.55: 0.899 (S=434, C=828, M=3552, R=344) | |
| ER@0.60: 0.927 (S=565, C=697, M=3552, R=344) | |
| ER@0.65: 0.956 (S=708, C=554, M=3552, R=344) | |
| ER@0.70: 0.984 (S=839, C=423, M=3552, R=344) | |
| ER@0.75: 1.010 (S=968, C=294, M=3552, R=344) | |
| ER@0.80: 1.032 (S=1070, C=192, M=3552, R=344) | |
| ER@0.85: 1.048 (S=1150, C=112, M=3552, R=344) | |
| ER@0.90: 1.061 (S=1212, C=50, M=3552, R=344) | |
| ER@0.95: 1.068 (S=1243, C=19, M=3552, R=344) | |
| ER@1.00: 1.071 (S=1262, C=0, M=3552, R=344) | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.7378 | |
| redundant_rate: 0.2142 | |
| match_cost: 0.4114 | |
| semantic_score: 0.6185 | |
| mean_error_rate: 0.9939 | |
| mean_error_rate_v2: 0.9276 | |
| jaccard_index: 0.2447 | |
| jaccard_index_v2: 0.0724 | |
| AP: 0.2325 | |
| AR: 0.0776 | |
| Avg-F1: 0.1163 | |
| num_matched: 1262.0000 | |
| num_missed: 3552.0000 | |
| num_redundant: 344.0000 | |
| num_correct_5: 938.0000 | |
| Bleu_1: 0.3718 | |
| Bleu_1_w: 0.0910 | |
| Bleu_2: 0.2621 | |
| Bleu_2_w: 0.0641 | |
| Bleu_3: 0.1980 | |
| Bleu_3_w: 0.0485 | |
| Bleu_4: 0.1562 | |
| Bleu_4_w: 0.0382 | |
| CIDEr: 0.9479 | |
| CIDEr_w: 0.2319 | |
| METEOR: 0.1951 | |
| METEOR_w: 0.0477 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| ER@0.50: 0.974 (S=482, C=1116, M=3216, R=991) | |
| ER@0.55: 1.005 (S=632, C=966, M=3216, R=991) | |
| ER@0.60: 1.044 (S=820, C=778, M=3216, R=991) | |
| ER@0.65: 1.080 (S=994, C=604, M=3216, R=991) | |
| ER@0.70: 1.112 (S=1147, C=451, M=3216, R=991) | |
| ER@0.75: 1.139 (S=1278, C=320, M=3216, R=991) | |
| ER@0.80: 1.163 (S=1390, C=208, M=3216, R=991) | |
| ER@0.85: 1.180 (S=1473, C=125, M=3216, R=991) | |
| ER@0.90: 1.194 (S=1539, C=59, M=3216, R=991) | |
| ER@0.95: 1.201 (S=1576, C=22, M=3216, R=991) | |
| ER@1.00: 1.206 (S=1598, C=0, M=3216, R=991) | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.6681 | |
| redundant_rate: 0.3828 | |
| match_cost: 0.4448 | |
| semantic_score: 0.5959 | |
| mean_error_rate: 1.1181 | |
| mean_error_rate_v2: 0.9272 | |
| jaccard_index: 0.2753 | |
| jaccard_index_v2: 0.0728 | |
| AP: 0.1632 | |
| AR: 0.0878 | |
| Avg-F1: 0.1142 | |
| num_matched: 1598.0000 | |
| num_missed: 3216.0000 | |
| num_redundant: 991.0000 | |
| num_correct_5: 1116.0000 | |
| Bleu_1: 0.3748 | |
| Bleu_1_w: 0.1032 | |
| Bleu_2: 0.2619 | |
| Bleu_2_w: 0.0721 | |
| Bleu_3: 0.1972 | |
| Bleu_3_w: 0.0543 | |
| Bleu_4: 0.1551 | |
| Bleu_4_w: 0.0427 | |
| CIDEr: 0.8899 | |
| CIDEr_w: 0.2450 | |
| METEOR: 0.1878 | |
| METEOR_w: 0.0517 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| ER@0.50: 1.446 (S=651, C=1370, M=2793, R=3516) | |
| ER@0.55: 1.488 (S=856, C=1165, M=2793, R=3516) | |
| ER@0.60: 1.533 (S=1073, C=948, M=2793, R=3516) | |
| ER@0.65: 1.583 (S=1310, C=711, M=2793, R=3516) | |
| ER@0.70: 1.625 (S=1512, C=509, M=2793, R=3516) | |
| ER@0.75: 1.661 (S=1687, C=334, M=2793, R=3516) | |
| ER@0.80: 1.687 (S=1813, C=208, M=2793, R=3516) | |
| ER@0.85: 1.706 (S=1904, C=117, M=2793, R=3516) | |
| ER@0.90: 1.718 (S=1962, C=59, M=2793, R=3516) | |
| ER@0.95: 1.727 (S=2006, C=15, M=2793, R=3516) | |
| ER@1.00: 1.730 (S=2021, C=0, M=2793, R=3516) | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.5802 | |
| redundant_rate: 0.6350 | |
| match_cost: 0.4652 | |
| semantic_score: 0.5812 | |
| mean_error_rate: 1.6277 | |
| mean_error_rate_v2: 0.9407 | |
| jaccard_index: 0.2426 | |
| jaccard_index_v2: 0.0593 | |
| AP: 0.0893 | |
| AR: 0.1027 | |
| Avg-F1: 0.0955 | |
| num_matched: 2021.0000 | |
| num_missed: 2793.0000 | |
| num_redundant: 3516.0000 | |
| num_correct_5: 1370.0000 | |
| Bleu_1: 0.3489 | |
| Bleu_1_w: 0.0846 | |
| Bleu_2: 0.2356 | |
| Bleu_2_w: 0.0572 | |
| Bleu_3: 0.1724 | |
| Bleu_3_w: 0.0418 | |
| Bleu_4: 0.1331 | |
| Bleu_4_w: 0.0323 | |
| CIDEr: 0.7083 | |
| CIDEr_w: 0.1718 | |
| METEOR: 0.1732 | |
| METEOR_w: 0.0420 | |
| Evaluation datasets: | |
| * holoassist/dialog_val | num samples: 291 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.1 | |
| ER@0.50: 0.893 (S=615, C=1652, M=12994, R=20) | |
| ER@0.55: 0.907 (S=829, C=1438, M=12994, R=20) | |
| ER@0.60: 0.923 (S=1067, C=1200, M=12994, R=20) | |
| ER@0.65: 0.937 (S=1288, C=979, M=12994, R=20) | |
| ER@0.70: 0.952 (S=1508, C=759, M=12994, R=20) | |
| ER@0.75: 0.966 (S=1723, C=544, M=12994, R=20) | |
| ER@0.80: 0.978 (S=1909, C=358, M=12994, R=20) | |
| ER@0.85: 0.987 (S=2055, C=212, M=12994, R=20) | |
| ER@0.90: 0.995 (S=2170, C=97, M=12994, R=20) | |
| ER@0.95: 0.999 (S=2237, C=30, M=12994, R=20) | |
| ER@1.00: 1.001 (S=2265, C=2, M=12994, R=20) | |
| Evalulation: holoassist-dialog_val_L0_I1/stream/notalk0.1-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.8515 | |
| redundant_rate: 0.0087 | |
| match_cost: 0.4095 | |
| semantic_score: 0.6075 | |
| mean_error_rate: 0.9580 | |
| mean_error_rate_v2: 0.9567 | |
| jaccard_index: 0.1484 | |
| jaccard_index_v2: 0.0433 | |
| AP: 0.2890 | |
| AR: 0.0433 | |
| Avg-F1: 0.0753 | |
| num_matched: 2267.0000 | |
| num_missed: 12994.0000 | |
| num_redundant: 20.0000 | |
| num_correct_5: 1652.0000 | |
| Bleu_1: 0.4318 | |
| Bleu_1_w: 0.0641 | |
| Bleu_2: 0.3170 | |
| Bleu_2_w: 0.0470 | |
| Bleu_3: 0.2456 | |
| Bleu_3_w: 0.0364 | |
| Bleu_4: 0.1959 | |
| Bleu_4_w: 0.0291 | |
| CIDEr: 1.1810 | |
| CIDEr_w: 0.1752 | |
| METEOR: 0.2112 | |
| METEOR_w: 0.0313 | |
| Updating eval setup: not_talk_threshold: 0.1 -> 0.2 | |
| ER@0.50: 0.761 (S=1643, C=3730, M=9888, R=88) | |
| ER@0.55: 0.794 (S=2147, C=3226, M=9888, R=88) | |
| ER@0.60: 0.825 (S=2613, C=2760, M=9888, R=88) | |
| ER@0.65: 0.859 (S=3137, C=2236, M=9888, R=88) | |
| ER@0.70: 0.896 (S=3698, C=1675, M=9888, R=88) | |
| ER@0.75: 0.929 (S=4201, C=1172, M=9888, R=88) | |
| ER@0.80: 0.956 (S=4611, C=762, M=9888, R=88) | |
| ER@0.85: 0.978 (S=4955, C=418, M=9888, R=88) | |
| ER@0.90: 0.993 (S=5182, C=191, M=9888, R=88) | |
| ER@0.95: 1.002 (S=5318, C=55, M=9888, R=88) | |
| ER@1.00: 1.005 (S=5366, C=7, M=9888, R=88) | |
| Evalulation: holoassist-dialog_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.6479 | |
| redundant_rate: 0.0161 | |
| match_cost: 0.4281 | |
| semantic_score: 0.5936 | |
| mean_error_rate: 0.9091 | |
| mean_error_rate_v2: 0.9039 | |
| jaccard_index: 0.3501 | |
| jaccard_index_v2: 0.0961 | |
| AP: 0.2702 | |
| AR: 0.0967 | |
| Avg-F1: 0.1424 | |
| num_matched: 5373.0000 | |
| num_missed: 9888.0000 | |
| num_redundant: 88.0000 | |
| num_correct_5: 3730.0000 | |
| Bleu_1: 0.4253 | |
| Bleu_1_w: 0.1489 | |
| Bleu_2: 0.3072 | |
| Bleu_2_w: 0.1075 | |
| Bleu_3: 0.2343 | |
| Bleu_3_w: 0.0820 | |
| Bleu_4: 0.1839 | |
| Bleu_4_w: 0.0644 | |
| CIDEr: 1.0880 | |
| CIDEr_w: 0.3809 | |
| METEOR: 0.2063 | |
| METEOR_w: 0.0722 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| ER@0.50: 0.735 (S=1963, C=4395, M=8903, R=348) | |
| ER@0.55: 0.773 (S=2547, C=3811, M=8903, R=348) | |
| ER@0.60: 0.811 (S=3119, C=3239, M=8903, R=348) | |
| ER@0.65: 0.851 (S=3741, C=2617, M=8903, R=348) | |
| ER@0.70: 0.893 (S=4379, C=1979, M=8903, R=348) | |
| ER@0.75: 0.932 (S=4970, C=1388, M=8903, R=348) | |
| ER@0.80: 0.965 (S=5473, C=885, M=8903, R=348) | |
| ER@0.85: 0.989 (S=5845, C=513, M=8903, R=348) | |
| ER@0.90: 1.008 (S=6125, C=233, M=8903, R=348) | |
| ER@0.95: 1.019 (S=6295, C=63, M=8903, R=348) | |
| ER@1.00: 1.023 (S=6354, C=4, M=8903, R=348) | |
| Evalulation: holoassist-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.5834 | |
| redundant_rate: 0.0519 | |
| match_cost: 0.4314 | |
| semantic_score: 0.5936 | |
| mean_error_rate: 0.9089 | |
| mean_error_rate_v2: 0.8886 | |
| jaccard_index: 0.4073 | |
| jaccard_index_v2: 0.1114 | |
| AP: 0.2593 | |
| AR: 0.1139 | |
| Avg-F1: 0.1583 | |
| num_matched: 6358.0000 | |
| num_missed: 8903.0000 | |
| num_redundant: 348.0000 | |
| num_correct_5: 4395.0000 | |
| Bleu_1: 0.4187 | |
| Bleu_1_w: 0.1705 | |
| Bleu_2: 0.2991 | |
| Bleu_2_w: 0.1218 | |
| Bleu_3: 0.2264 | |
| Bleu_3_w: 0.0922 | |
| Bleu_4: 0.1765 | |
| Bleu_4_w: 0.0719 | |
| CIDEr: 1.0479 | |
| CIDEr_w: 0.4269 | |
| METEOR: 0.2032 | |
| METEOR_w: 0.0828 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| ER@0.50: 1.353 (S=3884, C=6464, M=4913, R=11853) | |
| ER@0.55: 1.412 (S=4778, C=5570, M=4913, R=11853) | |
| ER@0.60: 1.474 (S=5734, C=4614, M=4913, R=11853) | |
| ER@0.65: 1.539 (S=6727, C=3621, M=4913, R=11853) | |
| ER@0.70: 1.599 (S=7636, C=2712, M=4913, R=11853) | |
| ER@0.75: 1.655 (S=8491, C=1857, M=4913, R=11853) | |
| ER@0.80: 1.703 (S=9223, C=1125, M=4913, R=11853) | |
| ER@0.85: 1.738 (S=9760, C=588, M=4913, R=11853) | |
| ER@0.90: 1.761 (S=10105, C=243, M=4913, R=11853) | |
| ER@0.95: 1.772 (S=10269, C=79, M=4913, R=11853) | |
| ER@1.00: 1.776 (S=10343, C=5, M=4913, R=11853) | |
| Evalulation: holoassist-dialog_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.3219 | |
| redundant_rate: 0.5339 | |
| match_cost: 0.4723 | |
| semantic_score: 0.5619 | |
| mean_error_rate: 1.6166 | |
| mean_error_rate_v2: 0.9099 | |
| jaccard_index: 0.3816 | |
| jaccard_index_v2: 0.0901 | |
| AP: 0.1101 | |
| AR: 0.1601 | |
| Avg-F1: 0.1304 | |
| num_matched: 10348.0000 | |
| num_missed: 4913.0000 | |
| num_redundant: 11853.0000 | |
| num_correct_5: 6464.0000 | |
| Bleu_1: 0.3951 | |
| Bleu_1_w: 0.1508 | |
| Bleu_2: 0.2737 | |
| Bleu_2_w: 0.1044 | |
| Bleu_3: 0.2024 | |
| Bleu_3_w: 0.0772 | |
| Bleu_4: 0.1539 | |
| Bleu_4_w: 0.0587 | |
| CIDEr: 0.8902 | |
| CIDEr_w: 0.3397 | |
| METEOR: 0.1863 | |
| METEOR_w: 0.0711 | |
| Evaluation datasets: | |
| * epickitchens/dialog_val | num samples: 150 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.1 | |
| ER@0.50: 0.848 (S=451, C=1091, M=4890, R=115) | |
| ER@0.55: 0.874 (S=615, C=927, M=4890, R=115) | |
| ER@0.60: 0.900 (S=782, C=760, M=4890, R=115) | |
| ER@0.65: 0.924 (S=938, C=604, M=4890, R=115) | |
| ER@0.70: 0.952 (S=1120, C=422, M=4890, R=115) | |
| ER@0.75: 0.972 (S=1245, C=297, M=4890, R=115) | |
| ER@0.80: 0.988 (S=1349, C=193, M=4890, R=115) | |
| ER@0.85: 1.002 (S=1437, C=105, M=4890, R=115) | |
| ER@0.90: 1.010 (S=1490, C=52, M=4890, R=115) | |
| ER@0.95: 1.016 (S=1527, C=15, M=4890, R=115) | |
| ER@1.00: 1.018 (S=1542, C=0, M=4890, R=115) | |
| Evalulation: epickitchens-dialog_val_L0_I1/stream/notalk0.1-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.7603 | |
| redundant_rate: 0.0694 | |
| match_cost: 0.4235 | |
| semantic_score: 0.5959 | |
| mean_error_rate: 0.9548 | |
| mean_error_rate_v2: 0.9380 | |
| jaccard_index: 0.2355 | |
| jaccard_index_v2: 0.0620 | |
| AP: 0.2450 | |
| AR: 0.0631 | |
| Avg-F1: 0.1004 | |
| num_matched: 1542.0000 | |
| num_missed: 4890.0000 | |
| num_redundant: 115.0000 | |
| num_correct_5: 1091.0000 | |
| Bleu_1: 0.4046 | |
| Bleu_1_w: 0.0953 | |
| Bleu_2: 0.2934 | |
| Bleu_2_w: 0.0691 | |
| Bleu_3: 0.2259 | |
| Bleu_3_w: 0.0532 | |
| Bleu_4: 0.1808 | |
| Bleu_4_w: 0.0426 | |
| CIDEr: 1.2252 | |
| CIDEr_w: 0.2886 | |
| METEOR: 0.2066 | |
| METEOR_w: 0.0487 | |
| Updating eval setup: not_talk_threshold: 0.1 -> 0.2 | |
| ER@0.50: 0.849 (S=795, C=1417, M=4220, R=448) | |
| ER@0.55: 0.884 (S=1017, C=1195, M=4220, R=448) | |
| ER@0.60: 0.919 (S=1242, C=970, M=4220, R=448) | |
| ER@0.65: 0.952 (S=1457, C=755, M=4220, R=448) | |
| ER@0.70: 0.985 (S=1670, C=542, M=4220, R=448) | |
| ER@0.75: 1.012 (S=1839, C=373, M=4220, R=448) | |
| ER@0.80: 1.033 (S=1974, C=238, M=4220, R=448) | |
| ER@0.85: 1.050 (S=2087, C=125, M=4220, R=448) | |
| ER@0.90: 1.061 (S=2156, C=56, M=4220, R=448) | |
| ER@0.95: 1.067 (S=2195, C=17, M=4220, R=448) | |
| ER@1.00: 1.070 (S=2212, C=0, M=4220, R=448) | |
| Evalulation: epickitchens-dialog_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.6561 | |
| redundant_rate: 0.1684 | |
| match_cost: 0.4707 | |
| semantic_score: 0.5700 | |
| mean_error_rate: 0.9893 | |
| mean_error_rate_v2: 0.9248 | |
| jaccard_index: 0.3215 | |
| jaccard_index_v2: 0.0752 | |
| AP: 0.1944 | |
| AR: 0.0804 | |
| Avg-F1: 0.1137 | |
| num_matched: 2212.0000 | |
| num_missed: 4220.0000 | |
| num_redundant: 448.0000 | |
| num_correct_5: 1417.0000 | |
| Bleu_1: 0.3931 | |
| Bleu_1_w: 0.1264 | |
| Bleu_2: 0.2744 | |
| Bleu_2_w: 0.0882 | |
| Bleu_3: 0.2034 | |
| Bleu_3_w: 0.0654 | |
| Bleu_4: 0.1582 | |
| Bleu_4_w: 0.0509 | |
| CIDEr: 1.1124 | |
| CIDEr_w: 0.3576 | |
| METEOR: 0.1942 | |
| METEOR_w: 0.0625 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| ER@0.50: 1.237 (S=1785, C=2098, M=2549, R=3624) | |
| ER@0.55: 1.302 (S=2204, C=1679, M=2549, R=3624) | |
| ER@0.60: 1.358 (S=2563, C=1320, M=2549, R=3624) | |
| ER@0.65: 1.412 (S=2908, C=975, M=2549, R=3624) | |
| ER@0.70: 1.463 (S=3236, C=647, M=2549, R=3624) | |
| ER@0.75: 1.497 (S=3458, C=425, M=2549, R=3624) | |
| ER@0.80: 1.526 (S=3643, C=240, M=2549, R=3624) | |
| ER@0.85: 1.545 (S=3762, C=121, M=2549, R=3624) | |
| ER@0.90: 1.555 (S=3828, C=55, M=2549, R=3624) | |
| ER@0.95: 1.560 (S=3860, C=23, M=2549, R=3624) | |
| ER@1.00: 1.563 (S=3883, C=0, M=2549, R=3624) | |
| Evalulation: epickitchens-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.3963 | |
| redundant_rate: 0.4827 | |
| match_cost: 0.5297 | |
| semantic_score: 0.5236 | |
| mean_error_rate: 1.4563 | |
| mean_error_rate_v2: 0.9314 | |
| jaccard_index: 0.3861 | |
| jaccard_index_v2: 0.0686 | |
| AP: 0.0918 | |
| AR: 0.1072 | |
| Avg-F1: 0.0989 | |
| num_matched: 3883.0000 | |
| num_missed: 2549.0000 | |
| num_redundant: 3624.0000 | |
| num_correct_5: 2098.0000 | |
| Bleu_1: 0.3749 | |
| Bleu_1_w: 0.1448 | |
| Bleu_2: 0.2515 | |
| Bleu_2_w: 0.0971 | |
| Bleu_3: 0.1796 | |
| Bleu_3_w: 0.0693 | |
| Bleu_4: 0.1355 | |
| Bleu_4_w: 0.0523 | |
| CIDEr: 0.9289 | |
| CIDEr_w: 0.3587 | |
| METEOR: 0.1781 | |
| METEOR_w: 0.0688 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| ER@0.50: 2.488 (S=2307, C=2666, M=1459, R=12234) | |
| ER@0.55: 2.574 (S=2864, C=2109, M=1459, R=12234) | |
| ER@0.60: 2.652 (S=3364, C=1609, M=1459, R=12234) | |
| ER@0.65: 2.718 (S=3790, C=1183, M=1459, R=12234) | |
| ER@0.70: 2.782 (S=4203, C=770, M=1459, R=12234) | |
| ER@0.75: 2.828 (S=4495, C=478, M=1459, R=12234) | |
| ER@0.80: 2.859 (S=4695, C=278, M=1459, R=12234) | |
| ER@0.85: 2.881 (S=4835, C=138, M=1459, R=12234) | |
| ER@0.90: 2.892 (S=4909, C=64, M=1459, R=12234) | |
| ER@0.95: 2.900 (S=4957, C=16, M=1459, R=12234) | |
| ER@1.00: 2.902 (S=4970, C=3, M=1459, R=12234) | |
| Evalulation: epickitchens-dialog_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.2268 | |
| redundant_rate: 0.7110 | |
| match_cost: 0.5235 | |
| semantic_score: 0.5205 | |
| mean_error_rate: 2.7704 | |
| mean_error_rate_v2: 0.9546 | |
| jaccard_index: 0.2664 | |
| jaccard_index_v2: 0.0454 | |
| AP: 0.0492 | |
| AR: 0.1316 | |
| Avg-F1: 0.0716 | |
| num_matched: 4973.0000 | |
| num_missed: 1459.0000 | |
| num_redundant: 12234.0000 | |
| num_correct_5: 2666.0000 | |
| Bleu_1: 0.3546 | |
| Bleu_1_w: 0.0945 | |
| Bleu_2: 0.2300 | |
| Bleu_2_w: 0.0613 | |
| Bleu_3: 0.1577 | |
| Bleu_3_w: 0.0420 | |
| Bleu_4: 0.1148 | |
| Bleu_4_w: 0.0306 | |
| CIDEr: 0.7842 | |
| CIDEr_w: 0.2089 | |
| METEOR: 0.1704 | |
| METEOR_w: 0.0454 | |
| Evaluation datasets: | |
| * egoexolearn/dialog_val | num samples: 123 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.1 | |
| ER@0.50: 0.874 (S=423, C=1564, M=10004, R=58) | |
| ER@0.55: 0.889 (S=599, C=1388, M=10004, R=58) | |
| ER@0.60: 0.908 (S=820, C=1167, M=10004, R=58) | |
| ER@0.65: 0.926 (S=1043, C=944, M=10004, R=58) | |
| ER@0.70: 0.944 (S=1261, C=726, M=10004, R=58) | |
| ER@0.75: 0.961 (S=1467, C=520, M=10004, R=58) | |
| ER@0.80: 0.976 (S=1645, C=342, M=10004, R=58) | |
| ER@0.85: 0.988 (S=1784, C=203, M=10004, R=58) | |
| ER@0.90: 0.997 (S=1892, C=95, M=10004, R=58) | |
| ER@0.95: 1.003 (S=1961, C=26, M=10004, R=58) | |
| ER@1.00: 1.005 (S=1987, C=0, M=10004, R=58) | |
| Evalulation: egoexolearn-dialog_val_L0_I1/stream/notalk0.1-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.8343 | |
| redundant_rate: 0.0284 | |
| match_cost: 0.3814 | |
| semantic_score: 0.6332 | |
| mean_error_rate: 0.9520 | |
| mean_error_rate_v2: 0.9474 | |
| jaccard_index: 0.1649 | |
| jaccard_index_v2: 0.0526 | |
| AP: 0.3101 | |
| AR: 0.0529 | |
| Avg-F1: 0.0904 | |
| num_matched: 1987.0000 | |
| num_missed: 10004.0000 | |
| num_redundant: 58.0000 | |
| num_correct_5: 1564.0000 | |
| Bleu_1: 0.4243 | |
| Bleu_1_w: 0.0700 | |
| Bleu_2: 0.3063 | |
| Bleu_2_w: 0.0505 | |
| Bleu_3: 0.2337 | |
| Bleu_3_w: 0.0385 | |
| Bleu_4: 0.1837 | |
| Bleu_4_w: 0.0303 | |
| CIDEr: 1.1154 | |
| CIDEr_w: 0.1839 | |
| METEOR: 0.2019 | |
| METEOR_w: 0.0333 | |
| Updating eval setup: not_talk_threshold: 0.1 -> 0.2 | |
| ER@0.50: 0.874 (S=442, C=1660, M=9889, R=149) | |
| ER@0.55: 0.889 (S=620, C=1482, M=9889, R=149) | |
| ER@0.60: 0.911 (S=886, C=1216, M=9889, R=149) | |
| ER@0.65: 0.929 (S=1102, C=1000, M=9889, R=149) | |
| ER@0.70: 0.948 (S=1324, C=778, M=9889, R=149) | |
| ER@0.75: 0.966 (S=1545, C=557, M=9889, R=149) | |
| ER@0.80: 0.982 (S=1734, C=368, M=9889, R=149) | |
| ER@0.85: 0.995 (S=1889, C=213, M=9889, R=149) | |
| ER@0.90: 1.004 (S=1997, C=105, M=9889, R=149) | |
| ER@0.95: 1.010 (S=2071, C=31, M=9889, R=149) | |
| ER@1.00: 1.012 (S=2101, C=1, M=9889, R=149) | |
| Evalulation: egoexolearn-dialog_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.8247 | |
| redundant_rate: 0.0662 | |
| match_cost: 0.3861 | |
| semantic_score: 0.6338 | |
| mean_error_rate: 0.9562 | |
| mean_error_rate_v2: 0.9445 | |
| jaccard_index: 0.1731 | |
| jaccard_index_v2: 0.0555 | |
| AP: 0.2993 | |
| AR: 0.0562 | |
| Avg-F1: 0.0946 | |
| num_matched: 2102.0000 | |
| num_missed: 9889.0000 | |
| num_redundant: 149.0000 | |
| num_correct_5: 1660.0000 | |
| Bleu_1: 0.4253 | |
| Bleu_1_w: 0.0736 | |
| Bleu_2: 0.3033 | |
| Bleu_2_w: 0.0525 | |
| Bleu_3: 0.2291 | |
| Bleu_3_w: 0.0397 | |
| Bleu_4: 0.1786 | |
| Bleu_4_w: 0.0309 | |
| CIDEr: 1.0711 | |
| CIDEr_w: 0.1855 | |
| METEOR: 0.1994 | |
| METEOR_w: 0.0345 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| ER@0.50: 0.903 (S=709, C=1881, M=9401, R=720) | |
| ER@0.55: 0.922 (S=931, C=1659, M=9401, R=720) | |
| ER@0.60: 0.944 (S=1197, C=1393, M=9401, R=720) | |
| ER@0.65: 0.968 (S=1481, C=1109, M=9401, R=720) | |
| ER@0.70: 0.989 (S=1741, C=849, M=9401, R=720) | |
| ER@0.75: 1.012 (S=2013, C=577, M=9401, R=720) | |
| ER@0.80: 1.028 (S=2200, C=390, M=9401, R=720) | |
| ER@0.85: 1.041 (S=2363, C=227, M=9401, R=720) | |
| ER@0.90: 1.052 (S=2495, C=95, M=9401, R=720) | |
| ER@0.95: 1.057 (S=2554, C=36, M=9401, R=720) | |
| ER@1.00: 1.060 (S=2590, C=0, M=9401, R=720) | |
| Evalulation: egoexolearn-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.7840 | |
| redundant_rate: 0.2175 | |
| match_cost: 0.4205 | |
| semantic_score: 0.6107 | |
| mean_error_rate: 0.9978 | |
| mean_error_rate_v2: 0.9412 | |
| jaccard_index: 0.2038 | |
| jaccard_index_v2: 0.0588 | |
| AP: 0.2257 | |
| AR: 0.0623 | |
| Avg-F1: 0.0976 | |
| num_matched: 2590.0000 | |
| num_missed: 9401.0000 | |
| num_redundant: 720.0000 | |
| num_correct_5: 1881.0000 | |
| Bleu_1: 0.4183 | |
| Bleu_1_w: 0.0852 | |
| Bleu_2: 0.2954 | |
| Bleu_2_w: 0.0602 | |
| Bleu_3: 0.2212 | |
| Bleu_3_w: 0.0451 | |
| Bleu_4: 0.1718 | |
| Bleu_4_w: 0.0350 | |
| CIDEr: 0.9797 | |
| CIDEr_w: 0.1996 | |
| METEOR: 0.1946 | |
| METEOR_w: 0.0397 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| ER@0.50: 1.334 (S=1620, C=2807, M=7564, R=6816) | |
| ER@0.55: 1.374 (S=2096, C=2331, M=7564, R=6816) | |
| ER@0.60: 1.411 (S=2535, C=1892, M=7564, R=6816) | |
| ER@0.65: 1.448 (S=2977, C=1450, M=7564, R=6816) | |
| ER@0.70: 1.480 (S=3371, C=1056, M=7564, R=6816) | |
| ER@0.75: 1.507 (S=3695, C=732, M=7564, R=6816) | |
| ER@0.80: 1.531 (S=3982, C=445, M=7564, R=6816) | |
| ER@0.85: 1.548 (S=4186, C=241, M=7564, R=6816) | |
| ER@0.90: 1.560 (S=4327, C=100, M=7564, R=6816) | |
| ER@0.95: 1.565 (S=4391, C=36, M=7564, R=6816) | |
| ER@1.00: 1.568 (S=4426, C=1, M=7564, R=6816) | |
| Evalulation: egoexolearn-dialog_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.6308 | |
| redundant_rate: 0.6062 | |
| match_cost: 0.4840 | |
| semantic_score: 0.5691 | |
| mean_error_rate: 1.4843 | |
| mean_error_rate_v2: 0.9464 | |
| jaccard_index: 0.2354 | |
| jaccard_index_v2: 0.0536 | |
| AP: 0.0897 | |
| AR: 0.0841 | |
| Avg-F1: 0.0868 | |
| num_matched: 4427.0000 | |
| num_missed: 7564.0000 | |
| num_redundant: 6816.0000 | |
| num_correct_5: 2807.0000 | |
| Bleu_1: 0.3987 | |
| Bleu_1_w: 0.0939 | |
| Bleu_2: 0.2722 | |
| Bleu_2_w: 0.0641 | |
| Bleu_3: 0.1974 | |
| Bleu_3_w: 0.0465 | |
| Bleu_4: 0.1489 | |
| Bleu_4_w: 0.0350 | |
| CIDEr: 0.8196 | |
| CIDEr_w: 0.1929 | |
| METEOR: 0.1795 | |
| METEOR_w: 0.0423 | |
| Evaluation datasets: | |
| * wtag/dialog_val | num samples: 21 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.1 | |
| ER@0.50: 0.968 (S=61, C=74, M=938, R=40) | |
| ER@0.55: 0.975 (S=68, C=67, M=938, R=40) | |
| ER@0.60: 0.984 (S=78, C=57, M=938, R=40) | |
| ER@0.65: 0.992 (S=86, C=49, M=938, R=40) | |
| ER@0.70: 1.003 (S=98, C=37, M=938, R=40) | |
| ER@0.75: 1.014 (S=110, C=25, M=938, R=40) | |
| ER@0.80: 1.022 (S=119, C=16, M=938, R=40) | |
| ER@0.85: 1.027 (S=124, C=11, M=938, R=40) | |
| ER@0.90: 1.033 (S=130, C=5, M=938, R=40) | |
| ER@0.95: 1.036 (S=134, C=1, M=938, R=40) | |
| ER@1.00: 1.037 (S=135, C=0, M=938, R=40) | |
| Evalulation: wtag-dialog_val_L0_I1/stream/notalk0.1-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.8742 | |
| redundant_rate: 0.2286 | |
| match_cost: 0.5259 | |
| semantic_score: 0.5400 | |
| mean_error_rate: 1.0083 | |
| mean_error_rate_v2: 0.9721 | |
| jaccard_index: 0.1213 | |
| jaccard_index_v2: 0.0279 | |
| AP: 0.1777 | |
| AR: 0.0290 | |
| Avg-F1: 0.0498 | |
| num_matched: 135.0000 | |
| num_missed: 938.0000 | |
| num_redundant: 40.0000 | |
| num_correct_5: 74.0000 | |
| Bleu_1: 0.3643 | |
| Bleu_1_w: 0.0442 | |
| Bleu_2: 0.2780 | |
| Bleu_2_w: 0.0337 | |
| Bleu_3: 0.2230 | |
| Bleu_3_w: 0.0270 | |
| Bleu_4: 0.1830 | |
| Bleu_4_w: 0.0222 | |
| CIDEr: 0.9718 | |
| CIDEr_w: 0.1179 | |
| METEOR: 0.2276 | |
| METEOR_w: 0.0276 | |
| Updating eval setup: not_talk_threshold: 0.1 -> 0.2 | |
| ER@0.50: 0.898 (S=286, C=250, M=537, R=141) | |
| ER@0.55: 0.942 (S=333, C=203, M=537, R=141) | |
| ER@0.60: 0.968 (S=361, C=175, M=537, R=141) | |
| ER@0.65: 1.010 (S=406, C=130, M=537, R=141) | |
| ER@0.70: 1.042 (S=440, C=96, M=537, R=141) | |
| ER@0.75: 1.073 (S=473, C=63, M=537, R=141) | |
| ER@0.80: 1.092 (S=494, C=42, M=537, R=141) | |
| ER@0.85: 1.105 (S=508, C=28, M=537, R=141) | |
| ER@0.90: 1.117 (S=521, C=15, M=537, R=141) | |
| ER@0.95: 1.125 (S=529, C=7, M=537, R=141) | |
| ER@1.00: 1.131 (S=536, C=0, M=537, R=141) | |
| Evalulation: wtag-dialog_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.5005 | |
| redundant_rate: 0.2083 | |
| match_cost: 0.5574 | |
| semantic_score: 0.4763 | |
| mean_error_rate: 1.0459 | |
| mean_error_rate_v2: 0.9244 | |
| jaccard_index: 0.4415 | |
| jaccard_index_v2: 0.0756 | |
| AP: 0.1355 | |
| AR: 0.0855 | |
| Avg-F1: 0.1048 | |
| num_matched: 536.0000 | |
| num_missed: 537.0000 | |
| num_redundant: 141.0000 | |
| num_correct_5: 250.0000 | |
| Bleu_1: 0.3320 | |
| Bleu_1_w: 0.1466 | |
| Bleu_2: 0.2397 | |
| Bleu_2_w: 0.1058 | |
| Bleu_3: 0.1773 | |
| Bleu_3_w: 0.0783 | |
| Bleu_4: 0.1347 | |
| Bleu_4_w: 0.0595 | |
| CIDEr: 0.9619 | |
| CIDEr_w: 0.4247 | |
| METEOR: 0.2105 | |
| METEOR_w: 0.0930 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| ER@0.50: 1.052 (S=343, C=285, M=445, R=341) | |
| ER@0.55: 1.096 (S=390, C=238, M=445, R=341) | |
| ER@0.60: 1.129 (S=425, C=203, M=445, R=341) | |
| ER@0.65: 1.176 (S=476, C=152, M=445, R=341) | |
| ER@0.70: 1.212 (S=514, C=114, M=445, R=341) | |
| ER@0.75: 1.252 (S=557, C=71, M=445, R=341) | |
| ER@0.80: 1.273 (S=580, C=48, M=445, R=341) | |
| ER@0.85: 1.287 (S=595, C=33, M=445, R=341) | |
| ER@0.90: 1.302 (S=611, C=17, M=445, R=341) | |
| ER@0.95: 1.312 (S=622, C=6, M=445, R=341) | |
| ER@1.00: 1.318 (S=628, C=0, M=445, R=341) | |
| Evalulation: wtag-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.4147 | |
| redundant_rate: 0.3519 | |
| match_cost: 0.5530 | |
| semantic_score: 0.4763 | |
| mean_error_rate: 1.2189 | |
| mean_error_rate_v2: 0.9250 | |
| jaccard_index: 0.4441 | |
| jaccard_index_v2: 0.0750 | |
| AP: 0.1095 | |
| AR: 0.0989 | |
| Avg-F1: 0.1039 | |
| num_matched: 628.0000 | |
| num_missed: 445.0000 | |
| num_redundant: 341.0000 | |
| num_correct_5: 285.0000 | |
| Bleu_1: 0.3256 | |
| Bleu_1_w: 0.1446 | |
| Bleu_2: 0.2305 | |
| Bleu_2_w: 0.1024 | |
| Bleu_3: 0.1717 | |
| Bleu_3_w: 0.0762 | |
| Bleu_4: 0.1319 | |
| Bleu_4_w: 0.0586 | |
| CIDEr: 0.8370 | |
| CIDEr_w: 0.3717 | |
| METEOR: 0.2128 | |
| METEOR_w: 0.0945 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| ER@0.50: 1.342 (S=373, C=317, M=383, R=684) | |
| ER@0.55: 1.386 (S=420, C=270, M=383, R=684) | |
| ER@0.60: 1.429 (S=466, C=224, M=383, R=684) | |
| ER@0.65: 1.486 (S=527, C=163, M=383, R=684) | |
| ER@0.70: 1.535 (S=580, C=110, M=383, R=684) | |
| ER@0.75: 1.578 (S=626, C=64, M=383, R=684) | |
| ER@0.80: 1.593 (S=642, C=48, M=383, R=684) | |
| ER@0.85: 1.607 (S=657, C=33, M=383, R=684) | |
| ER@0.90: 1.626 (S=678, C=12, M=383, R=684) | |
| ER@0.95: 1.636 (S=688, C=2, M=383, R=684) | |
| ER@1.00: 1.637 (S=690, C=0, M=383, R=684) | |
| Evalulation: wtag-dialog_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.3569 | |
| redundant_rate: 0.4978 | |
| match_cost: 0.5560 | |
| semantic_score: 0.4779 | |
| mean_error_rate: 1.5322 | |
| mean_error_rate_v2: 0.9357 | |
| jaccard_index: 0.3927 | |
| jaccard_index_v2: 0.0643 | |
| AP: 0.0822 | |
| AR: 0.1053 | |
| Avg-F1: 0.0924 | |
| num_matched: 690.0000 | |
| num_missed: 383.0000 | |
| num_redundant: 684.0000 | |
| num_correct_5: 317.0000 | |
| Bleu_1: 0.3075 | |
| Bleu_1_w: 0.1208 | |
| Bleu_2: 0.2124 | |
| Bleu_2_w: 0.0834 | |
| Bleu_3: 0.1528 | |
| Bleu_3_w: 0.0600 | |
| Bleu_4: 0.1136 | |
| Bleu_4_w: 0.0446 | |
| CIDEr: 0.6633 | |
| CIDEr_w: 0.2605 | |
| METEOR: 0.2026 | |
| METEOR_w: 0.0796 | |
| Evaluation datasets: | |
| * assembly101/dialog_val | num samples: 336 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.1 | |
| ER@0.50: 0.821 (S=543, C=1519, M=6256, R=32) | |
| ER@0.55: 0.841 (S=710, C=1352, M=6256, R=32) | |
| ER@0.60: 0.865 (S=907, C=1155, M=6256, R=32) | |
| ER@0.65: 0.887 (S=1094, C=968, M=6256, R=32) | |
| ER@0.70: 0.908 (S=1263, C=799, M=6256, R=32) | |
| ER@0.75: 0.933 (S=1472, C=590, M=6256, R=32) | |
| ER@0.80: 0.952 (S=1633, C=429, M=6256, R=32) | |
| ER@0.85: 0.967 (S=1759, C=303, M=6256, R=32) | |
| ER@0.90: 0.986 (S=1912, C=150, M=6256, R=32) | |
| ER@0.95: 0.996 (S=2000, C=62, M=6256, R=32) | |
| ER@1.00: 1.003 (S=2058, C=4, M=6256, R=32) | |
| Evalulation: assembly101-dialog_val_L0_I1/stream/notalk0.1-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.7521 | |
| redundant_rate: 0.0153 | |
| match_cost: 0.3791 | |
| semantic_score: 0.6275 | |
| mean_error_rate: 0.9237 | |
| mean_error_rate_v2: 0.9202 | |
| jaccard_index: 0.2469 | |
| jaccard_index_v2: 0.0798 | |
| AP: 0.3183 | |
| AR: 0.0801 | |
| Avg-F1: 0.1280 | |
| num_matched: 2062.0000 | |
| num_missed: 6256.0000 | |
| num_redundant: 32.0000 | |
| num_correct_5: 1519.0000 | |
| Bleu_1: 0.4728 | |
| Bleu_1_w: 0.1168 | |
| Bleu_2: 0.3662 | |
| Bleu_2_w: 0.0904 | |
| Bleu_3: 0.2930 | |
| Bleu_3_w: 0.0723 | |
| Bleu_4: 0.2422 | |
| Bleu_4_w: 0.0598 | |
| CIDEr: 1.3705 | |
| CIDEr_w: 0.3384 | |
| METEOR: 0.2369 | |
| METEOR_w: 0.0585 | |
| Updating eval setup: not_talk_threshold: 0.1 -> 0.2 | |
| ER@0.50: 0.809 (S=698, C=1751, M=5869, R=161) | |
| ER@0.55: 0.837 (S=932, C=1517, M=5869, R=161) | |
| ER@0.60: 0.864 (S=1158, C=1291, M=5869, R=161) | |
| ER@0.65: 0.895 (S=1411, C=1038, M=5869, R=161) | |
| ER@0.70: 0.917 (S=1597, C=852, M=5869, R=161) | |
| ER@0.75: 0.943 (S=1814, C=635, M=5869, R=161) | |
| ER@0.80: 0.966 (S=2007, C=442, M=5869, R=161) | |
| ER@0.85: 0.982 (S=2139, C=310, M=5869, R=161) | |
| ER@0.90: 1.002 (S=2302, C=147, M=5869, R=161) | |
| ER@0.95: 1.012 (S=2387, C=62, M=5869, R=161) | |
| ER@1.00: 1.019 (S=2444, C=5, M=5869, R=161) | |
| Evalulation: assembly101-dialog_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.7056 | |
| redundant_rate: 0.0617 | |
| match_cost: 0.4023 | |
| semantic_score: 0.6119 | |
| mean_error_rate: 0.9314 | |
| mean_error_rate_v2: 0.9137 | |
| jaccard_index: 0.2888 | |
| jaccard_index_v2: 0.0863 | |
| AP: 0.2804 | |
| AR: 0.0880 | |
| Avg-F1: 0.1339 | |
| num_matched: 2449.0000 | |
| num_missed: 5869.0000 | |
| num_redundant: 161.0000 | |
| num_correct_5: 1751.0000 | |
| Bleu_1: 0.4639 | |
| Bleu_1_w: 0.1340 | |
| Bleu_2: 0.3552 | |
| Bleu_2_w: 0.1026 | |
| Bleu_3: 0.2819 | |
| Bleu_3_w: 0.0814 | |
| Bleu_4: 0.2313 | |
| Bleu_4_w: 0.0668 | |
| CIDEr: 1.2826 | |
| CIDEr_w: 0.3704 | |
| METEOR: 0.2301 | |
| METEOR_w: 0.0665 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| ER@0.50: 0.806 (S=1159, C=2326, M=4833, R=711) | |
| ER@0.55: 0.851 (S=1533, C=1952, M=4833, R=711) | |
| ER@0.60: 0.895 (S=1897, C=1588, M=4833, R=711) | |
| ER@0.65: 0.937 (S=2251, C=1234, M=4833, R=711) | |
| ER@0.70: 0.966 (S=2495, C=990, M=4833, R=711) | |
| ER@0.75: 1.000 (S=2770, C=715, M=4833, R=711) | |
| ER@0.80: 1.027 (S=2999, C=486, M=4833, R=711) | |
| ER@0.85: 1.049 (S=3183, C=302, M=4833, R=711) | |
| ER@0.90: 1.069 (S=3348, C=137, M=4833, R=711) | |
| ER@0.95: 1.079 (S=3429, C=56, M=4833, R=711) | |
| ER@1.00: 1.085 (S=3482, C=3, M=4833, R=711) | |
| Evalulation: assembly101-dialog_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.5810 | |
| redundant_rate: 0.1694 | |
| match_cost: 0.4482 | |
| semantic_score: 0.5854 | |
| mean_error_rate: 0.9785 | |
| mean_error_rate_v2: 0.9014 | |
| jaccard_index: 0.3860 | |
| jaccard_index_v2: 0.0986 | |
| AP: 0.2121 | |
| AR: 0.1070 | |
| Avg-F1: 0.1422 | |
| num_matched: 3485.0000 | |
| num_missed: 4833.0000 | |
| num_redundant: 711.0000 | |
| num_correct_5: 2326.0000 | |
| Bleu_1: 0.4411 | |
| Bleu_1_w: 0.1703 | |
| Bleu_2: 0.3273 | |
| Bleu_2_w: 0.1263 | |
| Bleu_3: 0.2531 | |
| Bleu_3_w: 0.0977 | |
| Bleu_4: 0.2032 | |
| Bleu_4_w: 0.0784 | |
| CIDEr: 1.0574 | |
| CIDEr_w: 0.4081 | |
| METEOR: 0.2128 | |
| METEOR_w: 0.0821 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| ER@0.50: 1.241 (S=2486, C=3515, M=2317, R=5522) | |
| ER@0.55: 1.328 (S=3209, C=2792, M=2317, R=5522) | |
| ER@0.60: 1.404 (S=3837, C=2164, M=2317, R=5522) | |
| ER@0.65: 1.469 (S=4381, C=1620, M=2317, R=5522) | |
| ER@0.70: 1.516 (S=4771, C=1230, M=2317, R=5522) | |
| ER@0.75: 1.560 (S=5136, C=865, M=2317, R=5522) | |
| ER@0.80: 1.597 (S=5449, C=552, M=2317, R=5522) | |
| ER@0.85: 1.623 (S=5663, C=338, M=2317, R=5522) | |
| ER@0.90: 1.646 (S=5851, C=150, M=2317, R=5522) | |
| ER@0.95: 1.657 (S=5947, C=54, M=2317, R=5522) | |
| ER@1.00: 1.664 (S=5998, C=3, M=2317, R=5522) | |
| Evalulation: assembly101-dialog_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| missing_rate: 0.2786 | |
| redundant_rate: 0.4792 | |
| match_cost: 0.5017 | |
| semantic_score: 0.5476 | |
| mean_error_rate: 1.5187 | |
| mean_error_rate_v2: 0.9127 | |
| jaccard_index: 0.4336 | |
| jaccard_index_v2: 0.0873 | |
| AP: 0.1048 | |
| AR: 0.1452 | |
| Avg-F1: 0.1217 | |
| num_matched: 6001.0000 | |
| num_missed: 2317.0000 | |
| num_redundant: 5522.0000 | |
| num_correct_5: 3515.0000 | |
| Bleu_1: 0.4078 | |
| Bleu_1_w: 0.1768 | |
| Bleu_2: 0.2904 | |
| Bleu_2_w: 0.1259 | |
| Bleu_3: 0.2163 | |
| Bleu_3_w: 0.0938 | |
| Bleu_4: 0.1677 | |
| Bleu_4_w: 0.0727 | |
| CIDEr: 0.8164 | |
| CIDEr_w: 0.3540 | |
| METEOR: 0.1891 | |
| METEOR_w: 0.0820 | |
| All Finished! Time: 100.57 minutes | |
| Model: /fsx_0/user/imzyc/proact_exps/20240821-L4096-I1-ep4-NOSEP-nr0.1-klgmix-1s-lora-bs256 | |
| Runs: | |
| ego4d/dialog_val_L0_I1|stream|4k|0.05|summarize_and_drop | |
| ego4d/dialog_val_L0_I1|stream|4k|0.1|summarize_and_drop | |
| holoassist/dialog_val_L0_I1|stream|4k|0.1|summarize_and_drop | |
| epickitchens/dialog_val_L0_I1|stream|4k|0.1|summarize_and_drop | |
| egoexolearn/dialog_val_L0_I1|stream|4k|0.1|summarize_and_drop | |
| wtag/dialog_val_L0_I1|stream|4k|0.1|summarize_and_drop | |
| assembly101/dialog_val_L0_I1|stream|4k|0.1|summarize_and_drop | |
| ego4d/dialog_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| holoassist/dialog_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| epickitchens/dialog_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| egoexolearn/dialog_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| wtag/dialog_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| assembly101/dialog_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| ego4d/dialog_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| holoassist/dialog_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| epickitchens/dialog_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| egoexolearn/dialog_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| wtag/dialog_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| assembly101/dialog_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| ego4d/dialog_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| holoassist/dialog_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| epickitchens/dialog_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| egoexolearn/dialog_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| wtag/dialog_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| assembly101/dialog_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| usage: eval.py [-h] --model_path MODEL_PATH --inference_setups | |
| INFERENCE_SETUPS [--data_root_dir DATA_ROOT_DIR] | |
| [--force_rerun [FORCE_RERUN]] [--fps FPS] | |
| [--sts_model_type STS_MODEL_TYPE] | |
| [--match_window_time MATCH_WINDOW_TIME] | |
| [--match_dist_func_factor MATCH_DIST_FUNC_FACTOR] | |
| [--match_dist_func_power MATCH_DIST_FUNC_POWER] --job_name | |
| JOB_NAME [--num_nodes NUM_NODES] | |
| [--tasks_per_node TASKS_PER_NODE] | |
| [--gpus_per_node GPUS_PER_NODE] [--cpus_per_node CPUS_PER_NODE] | |
| [--mem_gb MEM_GB] [--timeout_min TIMEOUT_MIN] | |
| [--partition PARTITION] [--account ACCOUNT] [--log_dir LOG_DIR] | |
| [--slurm_exclude SLURM_EXCLUDE] | |
| eval.py: error: argument --match_window_time: invalid float value: 'auto' | |
| Traceback (most recent call last): | |
| File "/opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/mmassist/eval/eval.py", line 144, in <module> | |
| main(eval_args, slurm_args) | |
| File "/opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/mmassist/eval/eval.py", line 133, in main | |
| job.results() # wait for the job to finish | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/core.py", line 287, in results | |
| return [tp.cast(R, sub_job.result()) for sub_job in self._sub_jobs] | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/core.py", line 287, in <listcomp> | |
| return [tp.cast(R, sub_job.result()) for sub_job in self._sub_jobs] | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/core.py", line 266, in result | |
| r = self.results() | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/core.py", line 289, in results | |
| outcome, result = self._get_outcome_and_result() | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/core.py", line 384, in _get_outcome_and_result | |
| raise utils.UncompletedJobError("\n".join(message)) | |
| submitit.core.utils.UncompletedJobError: Job 14225 (task: 0) with path /opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/slurm_logs/14225/14225_0_result.pkl | |
| has not produced any output (state: CANCELLED by 649731) | |
| Error stream produced: | |
| ---------------------------------------- | |
| Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] | |
| Loading checkpoint shards: 25%|██▌ | 1/4 [00:13<00:40, 13.65s/it] | |
| Loading checkpoint shards: 50%|█████ | 2/4 [00:26<00:25, 12.93s/it] | |
| Loading checkpoint shards: 75%|███████▌ | 3/4 [00:38<00:12, 12.76s/it] | |
| Loading checkpoint shards: 100%|██████████| 4/4 [00:41<00:00, 8.82s/it] | |
| Loading checkpoint shards: 100%|██████████| 4/4 [00:41<00:00, 10.35s/it] | |
| Run predictions: 0%| | 0/2 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache) | |
| Run predictions: 50%|█████ | 1/2 [00:33<00:33, 33.94s/it] | |
| Run predictions: 100%|██████████| 2/2 [01:06<00:00, 33.34s/it] | |
| Run predictions: 100%|██████████| 2/2 [01:06<00:00, 33.43s/it] | |
| Run predictions: 0%| | 0/2 [00:00<?, ?it/s] | |
| Run predictions: 50%|█████ | 1/2 [00:36<00:36, 36.09s/it] | |
| Run predictions: 100%|██████████| 2/2 [01:10<00:00, 34.84s/it] | |
| Run predictions: 100%|██████████| 2/2 [01:10<00:00, 35.02s/it] | |
| Run predictions: 0%| | 0/2 [00:00<?, ?it/s] | |
| Run predictions: 50%|█████ | 1/2 [00:38<00:38, 38.92s/it] | |
| Run predictions: 100%|██████████| 2/2 [01:17<00:00, 38.74s/it] | |
| Run predictions: 100%|██████████| 2/2 [01:17<00:00, 38.76s/it] | |
| Run predictions: 0%| | 0/2 [00:00<?, ?it/s] | |
| Run predictions: 50%|█████ | 1/2 [00:48<00:48, 48.44s/it] | |
| Run predictions: 100%|██████████| 2/2 [01:44<00:00, 53.04s/it] | |
| Run predictions: 100%|██████████| 2/2 [01:44<00:00, 52.35s/it] | |
| Run predictions: 0%| | 0/2 [00:00<?, ?it/s] | |
| Run predictions: 50%|█████ | 1/2 [00:44<00:44, 44.90s/it] | |
| Run predictions: 100%|██████████| 2/2 [02:41<00:00, 87.27s/it] | |
| Run predictions: 100%|██████████| 2/2 [02:41<00:00, 80.92s/it] | |
| Run predictions: 0%| | 0/2 [00:00<?, ?it/s] | |
| Run predictions: 50%|█████ | 1/2 [01:52<01:52, 112.41s/it] | |
| Run predictions: 100%|██████████| 2/2 [05:45<00:00, 183.67s/it] | |
| Run predictions: 100%|██████████| 2/2 [05:45<00:00, 172.98s/it] | |
| Run predictions: 0%| | 0/5 [00:00<?, ?it/s] | |
| Run predictions: 20%|██ | 1/5 [00:26<01:45, 26.27s/it] | |
| Run predictions: 40%|████ | 2/5 [00:41<00:59, 19.97s/it] | |
| Run predictions: 60%|██████ | 3/5 [01:07<00:45, 22.68s/it] | |
| Run predictions: 80%|████████ | 4/5 [01:26<00:21, 21.22s/it]submitit WARNING (2024-08-22 11:42:38,079) - Bypassing signal SIGCONT | |
| slurmstepd: error: *** STEP 14225.0 ON h100-st-p548xlarge-173 CANCELLED AT 2024-08-22T11:42:38 *** | |
| submitit WARNING (2024-08-22 11:42:38,080) - Bypassing signal SIGTERM | |
| slurmstepd: error: *** JOB 14225 ON h100-st-p548xlarge-173 CANCELLED AT 2024-08-22T11:42:38 *** | |
| Run predictions: 100%|██████████| 5/5 [01:49<00:00, 21.81s/it] | |
| Run predictions: 100%|██████████| 5/5 [01:49<00:00, 21.92s/it] | |
| Run predictions: 0%| | 0/5 [00:00<?, ?it/s]slurmstepd: error: Failed to send MESSAGE_TASK_EXIT: Connection refused | |
| slurmstepd: error: Failed to send MESSAGE_TASK_EXIT: Connection refused | |
| Traceback (most recent call last): | |
| File "/opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/mmassist/eval/eval.py", line 144, in <module> | |
| main(eval_args, slurm_args) | |
| File "/opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/mmassist/eval/eval.py", line 133, in main | |
| job.results() # wait for the job to finish | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/core.py", line 287, in results | |
| return [tp.cast(R, sub_job.result()) for sub_job in self._sub_jobs] | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/core.py", line 287, in <listcomp> | |
| return [tp.cast(R, sub_job.result()) for sub_job in self._sub_jobs] | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/core.py", line 266, in result | |
| r = self.results() | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/core.py", line 294, in results | |
| raise job_exception # pylint: disable=raising-bad-type | |
| submitit.core.utils.FailedJobError: Job (task=0) failed during processing with trace: | |
| ---------------------- | |
| Traceback (most recent call last): | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/submission.py", line 55, in process_job | |
| result = delayed.result() | |
| File "/data/home/imzyc/miniconda3/envs/mm/lib/python3.10/site-packages/submitit/core/utils.py", line 133, in result | |
| self._result = self.function(*self.args, **self.kwargs) | |
| File "/opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/mmassist/eval/eval.py", line 77, in run_eval | |
| evaluator_cls = evaluator_name_to_cls[evaluator_name] | |
| KeyError: '' | |
| ---------------------- | |
| You can check full logs with 'job.stderr(0)' and 'job.stdout(0)'or at paths: | |
| - /opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/slurm_logs/14289/14289_0_log.err | |
| - /opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/imzyc/project/proactive-assist/slurm_logs/14289/14289_0_log.out | |
| Model: /fsx_0/user/imzyc/proact_exps/20240821-L4096-I1-ep4-NOSEP-nr0.1-klgmix-1s-lora-bs256 | |
| {'assembly101/dialog-klg_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.5}]}, | |
| 'ego4d/dialog-klg_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.5}]}, | |
| 'ego4d/dialog_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.05}]}, | |
| 'egoexolearn/dialog-klg_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.5}]}, | |
| 'epickitchens/dialog-klg_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.5}]}, | |
| 'holoassist/dialog-klg_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.5}]}, | |
| 'wtag/dialog-klg_val_L0_I1': {'stream': [{'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.2}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.3}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.4}, | |
| {'context_handling_method': 'summarize_and_drop', | |
| 'eval_max_seq_len': 4096, | |
| 'eval_max_seq_len_str': '4k', | |
| 'inference_runner_type': 'stream', | |
| 'not_talk_threshold': 0.5}]}} | |
| Evaluation datasets: | |
| * ego4d/dialog_val | num samples: 96 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.05 | |
| Evalulation: ego4d-dialog_val_L0_I1/stream/notalk0.05-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.1410 | |
| missing_rate: 0.8247 | |
| redundant_rate: 0.0254 | |
| semantic_score: 0.7123 | |
| time_diff: 0.2867 | |
| precision: 0.7875 | |
| recall: 0.1417 | |
| F1: 0.2401 | |
| num_matched: 682.0000 | |
| num_mismatched: 162.0000 | |
| num_missed: 3970.0000 | |
| num_redundant: 22.0000 | |
| Bleu_1: 0.3991 | |
| Bleu_1_w: 0.0563 | |
| Bleu_2: 0.2909 | |
| Bleu_2_w: 0.0410 | |
| Bleu_3: 0.2258 | |
| Bleu_3_w: 0.0319 | |
| Bleu_4: 0.1818 | |
| Bleu_4_w: 0.0256 | |
| CIDEr: 1.1137 | |
| CIDEr_w: 0.1571 | |
| METEOR: 0.2074 | |
| METEOR_w: 0.0293 | |
| Evaluation datasets: | |
| * ego4d/dialog-klg_val | num samples: 96 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.2 | |
| Evalulation: ego4d-dialog-klg_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2468 | |
| missing_rate: 0.6348 | |
| redundant_rate: 0.2078 | |
| semantic_score: 0.6974 | |
| time_diff: 1.1156 | |
| precision: 0.5868 | |
| recall: 0.2705 | |
| F1: 0.3703 | |
| num_matched: 1302.0000 | |
| num_mismatched: 456.0000 | |
| num_missed: 3056.0000 | |
| num_redundant: 461.0000 | |
| Bleu_1: 0.3854 | |
| Bleu_1_w: 0.0951 | |
| Bleu_2: 0.2700 | |
| Bleu_2_w: 0.0666 | |
| Bleu_3: 0.2003 | |
| Bleu_3_w: 0.0494 | |
| Bleu_4: 0.1556 | |
| Bleu_4_w: 0.0384 | |
| CIDEr: 0.8833 | |
| CIDEr_w: 0.2180 | |
| METEOR: 0.1913 | |
| METEOR_w: 0.0472 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| Evalulation: ego4d-dialog-klg_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2743 | |
| missing_rate: 0.4904 | |
| redundant_rate: 0.3830 | |
| semantic_score: 0.6882 | |
| time_diff: 1.3104 | |
| precision: 0.4371 | |
| recall: 0.3610 | |
| F1: 0.3954 | |
| num_matched: 1738.0000 | |
| num_mismatched: 715.0000 | |
| num_missed: 2361.0000 | |
| num_redundant: 1523.0000 | |
| Bleu_1: 0.3652 | |
| Bleu_1_w: 0.1002 | |
| Bleu_2: 0.2507 | |
| Bleu_2_w: 0.0688 | |
| Bleu_3: 0.1837 | |
| Bleu_3_w: 0.0504 | |
| Bleu_4: 0.1416 | |
| Bleu_4_w: 0.0388 | |
| CIDEr: 0.7549 | |
| CIDEr_w: 0.2070 | |
| METEOR: 0.1825 | |
| METEOR_w: 0.0500 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| Evalulation: ego4d-dialog-klg_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2248 | |
| missing_rate: 0.3635 | |
| redundant_rate: 0.6168 | |
| semantic_score: 0.6899 | |
| time_diff: 1.4115 | |
| precision: 0.2740 | |
| recall: 0.4551 | |
| F1: 0.3421 | |
| num_matched: 2191.0000 | |
| num_mismatched: 873.0000 | |
| num_missed: 1750.0000 | |
| num_redundant: 4932.0000 | |
| Bleu_1: 0.3759 | |
| Bleu_1_w: 0.0845 | |
| Bleu_2: 0.2574 | |
| Bleu_2_w: 0.0579 | |
| Bleu_3: 0.1876 | |
| Bleu_3_w: 0.0422 | |
| Bleu_4: 0.1437 | |
| Bleu_4_w: 0.0323 | |
| CIDEr: 0.7896 | |
| CIDEr_w: 0.1775 | |
| METEOR: 0.1780 | |
| METEOR_w: 0.0400 | |
| Updating eval setup: not_talk_threshold: 0.4 -> 0.5 | |
| Evalulation: ego4d-dialog-klg_val_L0_I1/stream/notalk0.5-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.1401 | |
| missing_rate: 0.2532 | |
| redundant_rate: 0.7929 | |
| semantic_score: 0.6834 | |
| time_diff: 1.3554 | |
| precision: 0.1499 | |
| recall: 0.5407 | |
| F1: 0.2348 | |
| num_matched: 2603.0000 | |
| num_mismatched: 992.0000 | |
| num_missed: 1219.0000 | |
| num_redundant: 13767.0000 | |
| Bleu_1: 0.3662 | |
| Bleu_1_w: 0.0513 | |
| Bleu_2: 0.2504 | |
| Bleu_2_w: 0.0351 | |
| Bleu_3: 0.1822 | |
| Bleu_3_w: 0.0255 | |
| Bleu_4: 0.1395 | |
| Bleu_4_w: 0.0195 | |
| CIDEr: 0.7510 | |
| CIDEr_w: 0.1052 | |
| METEOR: 0.1762 | |
| METEOR_w: 0.0247 | |
| Evaluation datasets: | |
| * holoassist/dialog-klg_val | num samples: 291 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.2 | |
| Evalulation: holoassist-dialog-klg_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2309 | |
| missing_rate: 0.6792 | |
| redundant_rate: 0.0274 | |
| semantic_score: 0.7122 | |
| time_diff: 0.1928 | |
| precision: 0.7064 | |
| recall: 0.2330 | |
| F1: 0.3504 | |
| num_matched: 3556.0000 | |
| num_mismatched: 1340.0000 | |
| num_missed: 10365.0000 | |
| num_redundant: 138.0000 | |
| Bleu_1: 0.4523 | |
| Bleu_1_w: 0.1044 | |
| Bleu_2: 0.3360 | |
| Bleu_2_w: 0.0776 | |
| Bleu_3: 0.2622 | |
| Bleu_3_w: 0.0605 | |
| Bleu_4: 0.2096 | |
| Bleu_4_w: 0.0484 | |
| CIDEr: 1.3210 | |
| CIDEr_w: 0.3050 | |
| METEOR: 0.2230 | |
| METEOR_w: 0.0515 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| Evalulation: holoassist-dialog-klg_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2914 | |
| missing_rate: 0.5893 | |
| redundant_rate: 0.0740 | |
| semantic_score: 0.7093 | |
| time_diff: 0.2722 | |
| precision: 0.6785 | |
| recall: 0.3010 | |
| F1: 0.4170 | |
| num_matched: 4593.0000 | |
| num_mismatched: 1675.0000 | |
| num_missed: 8993.0000 | |
| num_redundant: 501.0000 | |
| Bleu_1: 0.4412 | |
| Bleu_1_w: 0.1286 | |
| Bleu_2: 0.3238 | |
| Bleu_2_w: 0.0944 | |
| Bleu_3: 0.2501 | |
| Bleu_3_w: 0.0729 | |
| Bleu_4: 0.1980 | |
| Bleu_4_w: 0.0577 | |
| CIDEr: 1.2303 | |
| CIDEr_w: 0.3585 | |
| METEOR: 0.2185 | |
| METEOR_w: 0.0637 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| Evalulation: holoassist-dialog-klg_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2395 | |
| missing_rate: 0.3510 | |
| redundant_rate: 0.5117 | |
| semantic_score: 0.6945 | |
| time_diff: 0.3960 | |
| precision: 0.3028 | |
| recall: 0.4025 | |
| F1: 0.3456 | |
| num_matched: 6142.0000 | |
| num_mismatched: 3763.0000 | |
| num_missed: 5356.0000 | |
| num_redundant: 10381.0000 | |
| Bleu_1: 0.4184 | |
| Bleu_1_w: 0.1002 | |
| Bleu_2: 0.2997 | |
| Bleu_2_w: 0.0718 | |
| Bleu_3: 0.2278 | |
| Bleu_3_w: 0.0546 | |
| Bleu_4: 0.1783 | |
| Bleu_4_w: 0.0427 | |
| CIDEr: 1.0872 | |
| CIDEr_w: 0.2604 | |
| METEOR: 0.1996 | |
| METEOR_w: 0.0478 | |
| Updating eval setup: not_talk_threshold: 0.4 -> 0.5 | |
| Evalulation: holoassist-dialog-klg_val_L0_I1/stream/notalk0.5-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.1543 | |
| missing_rate: 0.1765 | |
| redundant_rate: 0.7351 | |
| semantic_score: 0.6862 | |
| time_diff: 0.4228 | |
| precision: 0.1630 | |
| recall: 0.5068 | |
| F1: 0.2467 | |
| num_matched: 7735.0000 | |
| num_mismatched: 4833.0000 | |
| num_missed: 2693.0000 | |
| num_redundant: 34874.0000 | |
| Bleu_1: 0.3959 | |
| Bleu_1_w: 0.0611 | |
| Bleu_2: 0.2795 | |
| Bleu_2_w: 0.0431 | |
| Bleu_3: 0.2109 | |
| Bleu_3_w: 0.0325 | |
| Bleu_4: 0.1632 | |
| Bleu_4_w: 0.0252 | |
| CIDEr: 0.9431 | |
| CIDEr_w: 0.1455 | |
| METEOR: 0.1880 | |
| METEOR_w: 0.0290 | |
| Evaluation datasets: | |
| * epickitchens/dialog-klg_val | num samples: 150 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.2 | |
| Evalulation: epickitchens-dialog-klg_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2117 | |
| missing_rate: 0.6014 | |
| redundant_rate: 0.2852 | |
| semantic_score: 0.6767 | |
| time_diff: 0.4677 | |
| precision: 0.4399 | |
| recall: 0.2453 | |
| F1: 0.3150 | |
| num_matched: 1578.0000 | |
| num_mismatched: 986.0000 | |
| num_missed: 3868.0000 | |
| num_redundant: 1023.0000 | |
| Bleu_1: 0.3939 | |
| Bleu_1_w: 0.0834 | |
| Bleu_2: 0.2761 | |
| Bleu_2_w: 0.0584 | |
| Bleu_3: 0.2048 | |
| Bleu_3_w: 0.0433 | |
| Bleu_4: 0.1586 | |
| Bleu_4_w: 0.0336 | |
| CIDEr: 1.0914 | |
| CIDEr_w: 0.2310 | |
| METEOR: 0.1975 | |
| METEOR_w: 0.0418 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| Evalulation: epickitchens-dialog-klg_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2132 | |
| missing_rate: 0.3789 | |
| redundant_rate: 0.5009 | |
| semantic_score: 0.6657 | |
| time_diff: 0.6098 | |
| precision: 0.2781 | |
| recall: 0.3461 | |
| F1: 0.3084 | |
| num_matched: 2226.0000 | |
| num_mismatched: 1769.0000 | |
| num_missed: 2437.0000 | |
| num_redundant: 4010.0000 | |
| Bleu_1: 0.3712 | |
| Bleu_1_w: 0.0791 | |
| Bleu_2: 0.2508 | |
| Bleu_2_w: 0.0535 | |
| Bleu_3: 0.1796 | |
| Bleu_3_w: 0.0383 | |
| Bleu_4: 0.1355 | |
| Bleu_4_w: 0.0289 | |
| CIDEr: 0.9349 | |
| CIDEr_w: 0.1993 | |
| METEOR: 0.1834 | |
| METEOR_w: 0.0391 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| Evalulation: epickitchens-dialog-klg_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.1429 | |
| missing_rate: 0.1957 | |
| redundant_rate: 0.7132 | |
| semantic_score: 0.6608 | |
| time_diff: 0.5977 | |
| precision: 0.1529 | |
| recall: 0.4288 | |
| F1: 0.2254 | |
| num_matched: 2758.0000 | |
| num_mismatched: 2415.0000 | |
| num_missed: 1259.0000 | |
| num_redundant: 12864.0000 | |
| Bleu_1: 0.3656 | |
| Bleu_1_w: 0.0523 | |
| Bleu_2: 0.2397 | |
| Bleu_2_w: 0.0343 | |
| Bleu_3: 0.1664 | |
| Bleu_3_w: 0.0238 | |
| Bleu_4: 0.1222 | |
| Bleu_4_w: 0.0175 | |
| CIDEr: 0.8853 | |
| CIDEr_w: 0.1265 | |
| METEOR: 0.1771 | |
| METEOR_w: 0.0253 | |
| Updating eval setup: not_talk_threshold: 0.4 -> 0.5 | |
| Evalulation: epickitchens-dialog-klg_val_L0_I1/stream/notalk0.5-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.1049 | |
| missing_rate: 0.0973 | |
| redundant_rate: 0.8029 | |
| semantic_score: 0.6564 | |
| time_diff: 0.5558 | |
| precision: 0.1071 | |
| recall: 0.4905 | |
| F1: 0.1758 | |
| num_matched: 3155.0000 | |
| num_mismatched: 2651.0000 | |
| num_missed: 626.0000 | |
| num_redundant: 23656.0000 | |
| Bleu_1: 0.3567 | |
| Bleu_1_w: 0.0374 | |
| Bleu_2: 0.2318 | |
| Bleu_2_w: 0.0243 | |
| Bleu_3: 0.1588 | |
| Bleu_3_w: 0.0167 | |
| Bleu_4: 0.1153 | |
| Bleu_4_w: 0.0121 | |
| CIDEr: 0.8132 | |
| CIDEr_w: 0.0853 | |
| METEOR: 0.1724 | |
| METEOR_w: 0.0181 | |
| Evaluation datasets: | |
| * egoexolearn/dialog-klg_val | num samples: 123 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.2 | |
| Evalulation: egoexolearn-dialog-klg_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.1387 | |
| missing_rate: 0.8104 | |
| redundant_rate: 0.1285 | |
| semantic_score: 0.7007 | |
| time_diff: 0.2868 | |
| precision: 0.6557 | |
| recall: 0.1426 | |
| F1: 0.2343 | |
| num_matched: 1710.0000 | |
| num_mismatched: 563.0000 | |
| num_missed: 9718.0000 | |
| num_redundant: 335.0000 | |
| Bleu_1: 0.4345 | |
| Bleu_1_w: 0.0603 | |
| Bleu_2: 0.3128 | |
| Bleu_2_w: 0.0434 | |
| Bleu_3: 0.2377 | |
| Bleu_3_w: 0.0330 | |
| Bleu_4: 0.1861 | |
| Bleu_4_w: 0.0258 | |
| CIDEr: 1.0884 | |
| CIDEr_w: 0.1510 | |
| METEOR: 0.2044 | |
| METEOR_w: 0.0284 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| Evalulation: egoexolearn-dialog-klg_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.1610 | |
| missing_rate: 0.7390 | |
| redundant_rate: 0.2805 | |
| semantic_score: 0.6830 | |
| time_diff: 0.5484 | |
| precision: 0.4890 | |
| recall: 0.1774 | |
| F1: 0.2603 | |
| num_matched: 2127.0000 | |
| num_mismatched: 1003.0000 | |
| num_missed: 8861.0000 | |
| num_redundant: 1220.0000 | |
| Bleu_1: 0.4062 | |
| Bleu_1_w: 0.0654 | |
| Bleu_2: 0.2817 | |
| Bleu_2_w: 0.0454 | |
| Bleu_3: 0.2067 | |
| Bleu_3_w: 0.0333 | |
| Bleu_4: 0.1572 | |
| Bleu_4_w: 0.0253 | |
| CIDEr: 0.8980 | |
| CIDEr_w: 0.1446 | |
| METEOR: 0.1898 | |
| METEOR_w: 0.0306 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| Evalulation: egoexolearn-dialog-klg_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.1324 | |
| missing_rate: 0.5542 | |
| redundant_rate: 0.6966 | |
| semantic_score: 0.6658 | |
| time_diff: 0.6759 | |
| precision: 0.1823 | |
| recall: 0.2679 | |
| F1: 0.2170 | |
| num_matched: 3212.0000 | |
| num_mismatched: 2133.0000 | |
| num_missed: 6646.0000 | |
| num_redundant: 12271.0000 | |
| Bleu_1: 0.3942 | |
| Bleu_1_w: 0.0522 | |
| Bleu_2: 0.2667 | |
| Bleu_2_w: 0.0353 | |
| Bleu_3: 0.1900 | |
| Bleu_3_w: 0.0252 | |
| Bleu_4: 0.1404 | |
| Bleu_4_w: 0.0186 | |
| CIDEr: 0.7886 | |
| CIDEr_w: 0.1044 | |
| METEOR: 0.1764 | |
| METEOR_w: 0.0234 | |
| Updating eval setup: not_talk_threshold: 0.4 -> 0.5 | |
| Evalulation: egoexolearn-dialog-klg_val_L0_I1/stream/notalk0.5-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.0834 | |
| missing_rate: 0.2899 | |
| redundant_rate: 0.8431 | |
| semantic_score: 0.6500 | |
| time_diff: 0.6787 | |
| precision: 0.0887 | |
| recall: 0.4014 | |
| F1: 0.1453 | |
| num_matched: 4813.0000 | |
| num_mismatched: 3702.0000 | |
| num_missed: 3476.0000 | |
| num_redundant: 45741.0000 | |
| Bleu_1: 0.3714 | |
| Bleu_1_w: 0.0310 | |
| Bleu_2: 0.2428 | |
| Bleu_2_w: 0.0202 | |
| Bleu_3: 0.1678 | |
| Bleu_3_w: 0.0140 | |
| Bleu_4: 0.1212 | |
| Bleu_4_w: 0.0101 | |
| CIDEr: 0.6385 | |
| CIDEr_w: 0.0532 | |
| METEOR: 0.1630 | |
| METEOR_w: 0.0136 | |
| Evaluation datasets: | |
| * wtag/dialog-klg_val | num samples: 21 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.2 | |
| Evalulation: wtag-dialog-klg_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2066 | |
| missing_rate: 0.4949 | |
| redundant_rate: 0.1597 | |
| semantic_score: 0.6875 | |
| time_diff: 0.4794 | |
| precision: 0.3767 | |
| recall: 0.2265 | |
| F1: 0.2829 | |
| num_matched: 243.0000 | |
| num_mismatched: 299.0000 | |
| num_missed: 531.0000 | |
| num_redundant: 103.0000 | |
| Bleu_1: 0.3329 | |
| Bleu_1_w: 0.0688 | |
| Bleu_2: 0.2397 | |
| Bleu_2_w: 0.0495 | |
| Bleu_3: 0.1807 | |
| Bleu_3_w: 0.0373 | |
| Bleu_4: 0.1397 | |
| Bleu_4_w: 0.0289 | |
| CIDEr: 0.9238 | |
| CIDEr_w: 0.1909 | |
| METEOR: 0.2236 | |
| METEOR_w: 0.0462 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| Evalulation: wtag-dialog-klg_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2219 | |
| missing_rate: 0.4473 | |
| redundant_rate: 0.2238 | |
| semantic_score: 0.6947 | |
| time_diff: 0.5145 | |
| precision: 0.3613 | |
| recall: 0.2572 | |
| F1: 0.3005 | |
| num_matched: 276.0000 | |
| num_mismatched: 317.0000 | |
| num_missed: 480.0000 | |
| num_redundant: 171.0000 | |
| Bleu_1: 0.3682 | |
| Bleu_1_w: 0.0817 | |
| Bleu_2: 0.2718 | |
| Bleu_2_w: 0.0603 | |
| Bleu_3: 0.2092 | |
| Bleu_3_w: 0.0464 | |
| Bleu_4: 0.1651 | |
| Bleu_4_w: 0.0366 | |
| CIDEr: 1.1512 | |
| CIDEr_w: 0.2554 | |
| METEOR: 0.2267 | |
| METEOR_w: 0.0503 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| Evalulation: wtag-dialog-klg_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2564 | |
| missing_rate: 0.3327 | |
| redundant_rate: 0.2701 | |
| semantic_score: 0.6876 | |
| time_diff: 0.8907 | |
| precision: 0.3496 | |
| recall: 0.3197 | |
| F1: 0.3340 | |
| num_matched: 343.0000 | |
| num_mismatched: 373.0000 | |
| num_missed: 357.0000 | |
| num_redundant: 265.0000 | |
| Bleu_1: 0.3570 | |
| Bleu_1_w: 0.0915 | |
| Bleu_2: 0.2619 | |
| Bleu_2_w: 0.0671 | |
| Bleu_3: 0.2009 | |
| Bleu_3_w: 0.0515 | |
| Bleu_4: 0.1585 | |
| Bleu_4_w: 0.0406 | |
| CIDEr: 0.9873 | |
| CIDEr_w: 0.2531 | |
| METEOR: 0.2162 | |
| METEOR_w: 0.0554 | |
| Updating eval setup: not_talk_threshold: 0.4 -> 0.5 | |
| Evalulation: wtag-dialog-klg_val_L0_I1/stream/notalk0.5-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2339 | |
| missing_rate: 0.3094 | |
| redundant_rate: 0.4072 | |
| semantic_score: 0.6828 | |
| time_diff: 0.9459 | |
| precision: 0.2960 | |
| recall: 0.3448 | |
| F1: 0.3186 | |
| num_matched: 370.0000 | |
| num_mismatched: 371.0000 | |
| num_missed: 332.0000 | |
| num_redundant: 509.0000 | |
| Bleu_1: 0.3504 | |
| Bleu_1_w: 0.0820 | |
| Bleu_2: 0.2562 | |
| Bleu_2_w: 0.0599 | |
| Bleu_3: 0.1955 | |
| Bleu_3_w: 0.0457 | |
| Bleu_4: 0.1535 | |
| Bleu_4_w: 0.0359 | |
| CIDEr: 0.9531 | |
| CIDEr_w: 0.2229 | |
| METEOR: 0.2106 | |
| METEOR_w: 0.0493 | |
| Evaluation datasets: | |
| * assembly101/dialog-klg_val | num samples: 336 | |
| Updating eval setup: inference_runner_type: None -> stream | |
| Updating eval setup: not_talk_threshold: 0.5 -> 0.2 | |
| Evalulation: assembly101-dialog-klg_val_L0_I1/stream/notalk0.2-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2184 | |
| missing_rate: 0.7032 | |
| redundant_rate: 0.0746 | |
| semantic_score: 0.7307 | |
| time_diff: 0.1704 | |
| precision: 0.6972 | |
| recall: 0.2236 | |
| F1: 0.3386 | |
| num_matched: 1860.0000 | |
| num_mismatched: 609.0000 | |
| num_missed: 5849.0000 | |
| num_redundant: 199.0000 | |
| Bleu_1: 0.4928 | |
| Bleu_1_w: 0.1076 | |
| Bleu_2: 0.3895 | |
| Bleu_2_w: 0.0851 | |
| Bleu_3: 0.3180 | |
| Bleu_3_w: 0.0695 | |
| Bleu_4: 0.2670 | |
| Bleu_4_w: 0.0583 | |
| CIDEr: 1.5431 | |
| CIDEr_w: 0.3370 | |
| METEOR: 0.2440 | |
| METEOR_w: 0.0533 | |
| Updating eval setup: not_talk_threshold: 0.2 -> 0.3 | |
| Evalulation: assembly101-dialog-klg_val_L0_I1/stream/notalk0.3-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2826 | |
| missing_rate: 0.5628 | |
| redundant_rate: 0.1831 | |
| semantic_score: 0.7158 | |
| time_diff: 0.4370 | |
| precision: 0.5797 | |
| recall: 0.3103 | |
| F1: 0.4042 | |
| num_matched: 2581.0000 | |
| num_mismatched: 1056.0000 | |
| num_missed: 4681.0000 | |
| num_redundant: 815.0000 | |
| Bleu_1: 0.4638 | |
| Bleu_1_w: 0.1311 | |
| Bleu_2: 0.3574 | |
| Bleu_2_w: 0.1010 | |
| Bleu_3: 0.2848 | |
| Bleu_3_w: 0.0805 | |
| Bleu_4: 0.2343 | |
| Bleu_4_w: 0.0662 | |
| CIDEr: 1.3007 | |
| CIDEr_w: 0.3676 | |
| METEOR: 0.2275 | |
| METEOR_w: 0.0643 | |
| Updating eval setup: not_talk_threshold: 0.3 -> 0.4 | |
| Evalulation: assembly101-dialog-klg_val_L0_I1/stream/notalk0.4-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.2835 | |
| missing_rate: 0.2888 | |
| redundant_rate: 0.4715 | |
| semantic_score: 0.7024 | |
| time_diff: 0.7532 | |
| precision: 0.3443 | |
| recall: 0.4633 | |
| F1: 0.3951 | |
| num_matched: 3854.0000 | |
| num_mismatched: 2062.0000 | |
| num_missed: 2402.0000 | |
| num_redundant: 5277.0000 | |
| Bleu_1: 0.4325 | |
| Bleu_1_w: 0.1226 | |
| Bleu_2: 0.3229 | |
| Bleu_2_w: 0.0915 | |
| Bleu_3: 0.2512 | |
| Bleu_3_w: 0.0712 | |
| Bleu_4: 0.2029 | |
| Bleu_4_w: 0.0575 | |
| CIDEr: 1.1226 | |
| CIDEr_w: 0.3182 | |
| METEOR: 0.2074 | |
| METEOR_w: 0.0588 | |
| Updating eval setup: not_talk_threshold: 0.4 -> 0.5 | |
| Evalulation: assembly101-dialog-klg_val_L0_I1/stream/notalk0.5-maxlen_4k | |
| Metrics: | |
| jaccard_index: 0.1584 | |
| missing_rate: 0.1077 | |
| redundant_rate: 0.7417 | |
| semantic_score: 0.6834 | |
| time_diff: 0.7472 | |
| precision: 0.1634 | |
| recall: 0.5644 | |
| F1: 0.2534 | |
| num_matched: 4695.0000 | |
| num_mismatched: 2727.0000 | |
| num_missed: 896.0000 | |
| num_redundant: 21314.0000 | |
| Bleu_1: 0.4066 | |
| Bleu_1_w: 0.0644 | |
| Bleu_2: 0.2959 | |
| Bleu_2_w: 0.0469 | |
| Bleu_3: 0.2251 | |
| Bleu_3_w: 0.0357 | |
| Bleu_4: 0.1783 | |
| Bleu_4_w: 0.0282 | |
| CIDEr: 0.9545 | |
| CIDEr_w: 0.1512 | |
| METEOR: 0.1920 | |
| METEOR_w: 0.0304 | |
| All Finished! Time: 70.23 minutes | |
| Model: /fsx_0/user/imzyc/proact_exps/20240821-L4096-I1-ep4-NOSEP-nr0.1-klgmix-1s-lora-bs256 | |
| Runs: | |
| ego4d/dialog_val_L0_I1|stream|4k|0.05|summarize_and_drop | |
| ego4d/dialog-klg_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| holoassist/dialog-klg_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| epickitchens/dialog-klg_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| egoexolearn/dialog-klg_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| wtag/dialog-klg_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| assembly101/dialog-klg_val_L0_I1|stream|4k|0.2|summarize_and_drop | |
| ego4d/dialog-klg_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| holoassist/dialog-klg_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| epickitchens/dialog-klg_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| egoexolearn/dialog-klg_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| wtag/dialog-klg_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| assembly101/dialog-klg_val_L0_I1|stream|4k|0.3|summarize_and_drop | |
| ego4d/dialog-klg_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| holoassist/dialog-klg_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| epickitchens/dialog-klg_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| egoexolearn/dialog-klg_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| wtag/dialog-klg_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| assembly101/dialog-klg_val_L0_I1|stream|4k|0.4|summarize_and_drop | |
| ego4d/dialog-klg_val_L0_I1|stream|4k|0.5|summarize_and_drop | |
| holoassist/dialog-klg_val_L0_I1|stream|4k|0.5|summarize_and_drop | |
| epickitchens/dialog-klg_val_L0_I1|stream|4k|0.5|summarize_and_drop | |
| egoexolearn/dialog-klg_val_L0_I1|stream|4k|0.5|summarize_and_drop | |
| wtag/dialog-klg_val_L0_I1|stream|4k|0.5|summarize_and_drop | |
| assembly101/dialog-klg_val_L0_I1|stream|4k|0.5|summarize_and_drop | |