Spaces:
Running
Running
| <html><head><meta charset="utf-8"><title>MIC Error Analysis — 30 cases</title> | |
| <style> | |
| body{font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;max-width:1200px;margin:30px auto;padding:0 20px;color:#222;line-height:1.45;background:#f9f9f9;} | |
| h1{border-bottom:2px solid #333;padding-bottom:6px;} | |
| code{background:#eee;padding:1px 5px;border-radius:3px;font-size:13px;} | |
| a{color:#36a;text-decoration:none;} a:hover{text-decoration:underline;} | |
| .nav{position:sticky;top:0;background:#fff;padding:10px 14px;border:1px solid #ddd;border-radius:6px;margin-bottom:20px;font-size:14px;box-shadow:0 2px 6px rgba(0,0,0,0.04);z-index:10;} | |
| </style></head> | |
| <body> | |
| <h1>MIC (Ours-SFT-GRPO) — Error Analysis · 30 cases</h1> | |
| <p style="color:#555;">All 30 cases are <b>genuine MIC errors</b> (no schema-only disagreements). Sampled from <code>test_id_edit</code> and <code>test_ood_edit</code> (canonical prompt). Cases with weak / ambiguous ground truth (architectural-style swaps, universal gestures, vague "X-style" descriptors) have been filtered out.</p> | |
| <div style="background:#fff8e1;border:1px solid #f0c970;border-radius:6px;padding:10px 14px;margin:10px 0 20px 0;font-size:14px;color:#444;"> | |
| <b>How to read each card.</b> MIC was given <b>only the edited (right) image</b> + the caption; the original (left) is shown for human comparison only. Ground truth verdict for every case is <code>INCONSISTENT</code> (the image is edited). The right-hand panel shows MIC's <code><verdict> / <type> / <grounding> / <knowledge></code> output on the edited image. The error is whichever part diverges from the ground truth on the left. | |
| </div> | |
| <div class="nav"><b>Jump to:</b> <a href="#mode-A">Mode A (15)</a> · <a href="#mode-B">Mode B (8)</a> · <a href="#mode-C">Mode C (7)</a> · <span style="color:#888;">A = verdict miss · B = fabricated evidence (g_score < 0.3) · C = wrong attribution (g_score ≥ 0.5)</span></div> | |
| <h2 id="mode-A" style="background:#fde2e2;padding:12px 16px;border-radius:6px;border-left:5px solid #e88;">Mode A — Perceptually subtle / locally-plausible edits (verdict miss) <span style="float:right;color:#555;font-size:14px;">15 cases</span></h2> | |
| <div class="card" id="case-1" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #1</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>environmental</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Japan’s Virus Success Has Puzzled the World. Is Its Luck Running Out?</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> People were out and about in Osaka, Japan, on Wednesday. Japan has seemed to contain the coronavirus without broad lockdowns or widespread testing.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/724094106865_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/724094106865_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Right-side cherry-blossom-style ornamental display → Mediterranean Italian cypress with dry golden grass and olive leaves</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> A Mediterranean cypress-and-dry-grass planting suggests a southern European dry-climate landscape cue rather than the springtime urban Japanese setting described in Osaka.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-2" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #2</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>clothing</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Denial and Defiance: Trump and His Base Downplay the Virus Ahead of the Election</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> President Trump defied the governor of Nevada by holding an indoor rally near Las Vegas last week. The state has been devastated by the pandemic and its economic toll.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/24c179a0da59_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/24c179a0da59_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Trump-style dark suit, white shirt, and red tie → formal Filipino ivory barong Tagalog</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> A U.S. presidential campaign rally in Nevada would not naturally feature the candidate dressed in formal Filipino national attire, which shifts the political and cultural context away from the caption’s American event.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-3" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #3</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>clothing</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Deadly Explosion Hits Luxury Hotel in Pakistan</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> A bomb exploded outside the Serena Hotel in the city of Quetta on Wednesday, killing at least four people. A Chinese delegation, including the country’s ambassador, was staying at the hotel but wasn’t present at the time.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/344bd57b6951_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/344bd57b6951_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Central man’s light Pakistani shalwar kameez → Uzbek blue striped chapan with black doppa skullcap</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes a bombing scene in Quetta, Pakistan, where the visible local civilian clothing would plausibly be Pakistani attire rather than distinctly Uzbek traditional dress.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-4" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #4</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>clothing</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Paramedics, Strained in the Hot Zone, Pull Back From CPR</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Emergency workers in Newark last month transported a patient with Covid-19 symptoms.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/1fd2e67bd666_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/1fd2e67bd666_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Blue U.S. medical PPE gowns and gloves → French SAMU/SMUR-style navy emergency uniforms with reflective bands and SAMU insignia</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes emergency workers transporting a Covid-19 patient in Newark, New Jersey, so showing French SAMU personnel implies the responders are from France rather than a Newark-area U.S. service.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>clothing</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Brazilian SAMU-style navy emergency uniforms with 'SAMU' patches and reflective yellow bands</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption identifies the scene as emergency workers in Newark, New Jersey, but Brazilian SAMU uniforms would indicate emergency personnel from Brazil instead.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-5" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #5</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>technology</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Trump Travels to Iowa to Energize Supporters for Caucuses Next Week</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> President Trump on Thursday at a rally in Des Moines.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/2c60476d7e9d_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/2c60476d7e9d_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Audience members’ smartphones → German Leica M6 35mm film cameras</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> A 2020 Trump rally in Des Moines would naturally be documented by spectators using smartphones, so replacing them with German analog film cameras creates a subtle but incorrect time-and-context cue.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-6" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #6</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>text_language</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Myanmar Soldiers, Aiming to Silence Protests, Target Journalists</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Covering a protest battle in Yangon, Myanmar, on Sunday. Three photojournalists have been shot and wounded while taking photographs of the anti-coup demonstrations.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/0ec2b88c069a_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/0ec2b88c069a_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> English cement-bag branding ('CROWN cement' / 'CEMENT') → Thai cement-bag branding ('ปูนซีเมนต์ตรามงกุฎ' / 'ปูนซีเมนต์')</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> Thai-language cement packaging suggests the barricade materials are from Thailand, which conflicts with the caption identifying the protest scene as taking place in Yangon, Myanmar.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>ads_anachronism</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> black Adidas backpack with large white 'adidas' logo</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption describes journalists covering anti-coup protests in Myanmar, so replacing their press gear with a prominent contemporary Adidas-branded backpack subtly shifts the scene toward consumer sportswear branding rather than authentic press equipment.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-7" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #7</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>clothing</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Deadly Explosion Hits Kabul Tutoring Center</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Residents in Kabul gathered near the site of an attack on an education center on Saturday.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/5cd221150f15_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/5cd221150f15_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Gray T-shirt and dark pants on central young man → Pakistani cream shalwar kameez with charcoal waistcoat and Pakistani flag patch</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes residents gathered after an attack in Kabul, Afghanistan, so dressing a prominent civilian in explicitly Pakistani national-style clothing with a Pakistan flag patch introduces the wrong national context.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-8" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #8</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>environmental</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> In Wisconsin Poll, Unrest Concerns Don’t Translate Into Surge for Trump</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Residents of Kenosha, Wis., watched President Trump’s motorcade this month. A Times poll found that Mr. Trump has failed so far in his attempt to capitalize politically on his inflammatory remarks about the unrest there.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/b002090647ad_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/b002090647ad_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Sidewalk crack grass and weeds → shallow muddy monsoon water with rice paddy seedlings</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> Flooded rice-seedling ground cover suggests a South Asian monsoon agricultural environment, which conflicts with the caption’s setting in Kenosha, Wisconsin during a presidential motorcade.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-9" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #9</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>clothing</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Migrant Boat Sinks Near Greek Island, Killing at Least 12</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Medical staff transport a survivor at a hospital in the northwestern Greek port town of Preveza on Saturday.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/ccf4a52d3b73_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/ccf4a52d3b73_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Greek EKAB responder jacket marking → Italian 118 emergency medical service marking</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption places the scene at a hospital in Preveza, Greece, so showing an Italian emergency-service uniform identifier instead of Greece’s EKAB contradicts the stated location and responders.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-10" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #10</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>social_behavior</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> South Dakota Attorney General Faces Impeachment and Calls to Resign Over Crash</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Attorney General Jason Ravnsborg of South Dakota in 2019 in front of the U.S. Supreme Court.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/7d29a57b9eb7_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/7d29a57b9eb7_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Man speaking at microphones → man performing a Chinese baoquan fist-palm salute</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> A South Dakota attorney general making a public statement outside the U.S. Supreme Court would be expected to speak or gesture in standard American political press-conference style, not use a formal Chinese martial/cultural greeting.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>social_behavior</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> West African prostration-style greeting with palms pressed together</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption describes South Dakota Attorney General Jason Ravnsborg making a public statement before the U.S. Supreme Court, but the edited gesture makes him appear to be performing a devotional West African greeting instead of addressing reporters.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-11" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #11</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>text_language</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Germany’s Far-Right Party Wins Suit Against Interior Minister</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Tino Chrupalla, left, and Jörg Meuthen, leaders of the the far-right Alternative for Germany party, addressing the media on Tuesday after the court decision was released.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/838d35dd6453_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/838d35dd6453_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Blue microphone text 'SWR3' → Dutch broadcaster text 'NOS'</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes German far-right party leaders speaking to the media in Germany, but a Dutch-language broadcaster identifier subtly suggests the press scene is tied to the Netherlands instead.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>technology</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Dutch NOS-branded broadcast microphones and camera monitor</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption identifies the scene as a German political media event, so prominently Dutch NOS broadcast equipment implies the media infrastructure belongs to the Netherlands rather than Germany.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-12" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #12</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>social_behavior</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> From Hiding, Kyrgyzstan’s Leader Declares State of Emergency</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Protesters from rival political groups rallied in Bishkek, Kyrgyzstan’s capital, on Friday.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/46c9e72fcb0c_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/46c9e72fcb0c_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Raised protest fists and rallying arms → formal German-style right-handed handshakes between adjacent men</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> A crowd at a rival political protest in Bishkek would be expected to display confrontational or solidarity protest gestures, not orderly mutual formal greetings that suggest cordial reception.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-13" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #13</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>clothing</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> In Xi’s Homage to Korean War, a Jab at the U.S.</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> An event marking the 70th anniversary of China’s participation in the Korean War at the Great Hall of the People in Beijing on Friday.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/af3f3c181f22_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/af3f3c181f22_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Chinese officials’ dark business suits → French Army dark navy dress uniforms with gold insignia and small French tricolor sleeve patches</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes a Chinese state commemoration of the Korean War in Beijing, so visible French military dress uniforms among the principal attendees would be the wrong national military identity.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>clothing</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> formal Chinese PLA officer uniforms with red collar tabs</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption describes a 70th-anniversary commemorative event in Beijing, so replacing attendees with PLA officers changes the scene from a civilian political ceremony into a military-political one.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-14" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #14</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>social_behavior</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> A Few Thousand Protest Stay-at-Home Order at Wisconsin State Capitol</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> People protested Gov. Tony Evers’s extended stay-at-home order at the Capitol in Madison, Wis., on Friday.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/5bbd9bd57598_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/5bbd9bd57598_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Raised-hands protest gesture → Islamic qiyam prayer posture with arms folded over the torso</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes people protesting a Wisconsin stay-at-home order, so depicting a participant in formal Islamic prayer changes the social meaning of the gathering away from a political demonstration.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-15" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fde2e2;padding:2px 8px;border-radius:4px;border-left:3px solid #e88;">Mode A</span> Case #15</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>environmental</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> ‘We Can Bring Down the Regime’: Myanmar’s Protesting Workers Are Unbowed</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Thousands of protesters gathered in Yangon, Myanmar, on Monday.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/494754ac5081_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/494754ac5081_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Generic Yangon street trees → mature South Asian monsoon banyan trees with aerial roots</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption places the protest in Yangon, but replacing the streetside vegetation with iconic banyan-lined South Asian monsoon streets subtly shifts the environmental setting toward the Indian subcontinent rather than urban Myanmar.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.0</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <h2 id="mode-B" style="background:#fff4d6;padding:12px 16px;border-radius:6px;border-left:5px solid #dc6;">Mode B — Hallucinated visual grounding (verdict right, evidence fabricated) <span style="float:right;color:#555;font-size:14px;">8 cases</span></h2> | |
| <div class="card" id="case-16" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fff4d6;padding:2px 8px;border-radius:4px;border-left:3px solid #dc6;">Mode B</span> Case #16</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>text_language</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Turkey Court Sentences U.S. Consulate Employee to Over 8 Years in Prison</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> The courthouse in Istanbul last year during the trial of Metin Topuz, an American consulate employee accused of spying and attempting to overthrow the Turkish government.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/3f85eb9613f2_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/3f85eb9613f2_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Turkish courthouse and police signage ('ISTANBUL ADALET SARAYI', 'POLIS') → Georgian Mkhedruli signage ('თბილისის საქალაქო სასამართლო', 'პოლიცია')</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption identifies the scene as a courthouse in Istanbul, Turkey, but Georgian-language official signage implies the building is in Georgia instead.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>text_language</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Khmer-script courthouse signs ('ស្ថាប័នការងារ', 'រដ្ឋបាលទីកន្លែងការងារ')</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption identifies the scene as a courthouse in Istanbul, but Khmer administrative signage implies a Cambodian governmental setting instead.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-17" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fff4d6;padding:2px 8px;border-radius:4px;border-left:3px solid #dc6;">Mode B</span> Case #17</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>ads_anachronism</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Mother in Custody After 3 Children Are Found Dead, Police Say</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Firefighters pronounced the three children dead at an apartment complex in Los Angeles on Saturday, the police said.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/dd4220c54f77_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/dd4220c54f77_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Apartment leasing sign 'Royal Villa Apartments / Now Leasing' → 'OCCUPY WALL STREET / WE ARE THE 99%' social movement banner</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> An Occupy Wall Street banner is tied to the 2011 protest movement, which is anachronistic for a Los Angeles apartment homicide scene from this later news event.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>architecture</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Spanish colonial-style arched stucco facade with terracotta tile roofline</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption describes a police response at an apartment complex in Los Angeles, but the edited architecture suggests a Latin American Spanish colonial setting rather than a typical California urban residential building.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-18" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fff4d6;padding:2px 8px;border-radius:4px;border-left:3px solid #dc6;">Mode B</span> Case #18</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>text_language</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Palestinians, Slammed for Suppressing Dissent, Free Protest Organizers</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Activists were arrested for planning to stage an anti-corruption demonstration in Manara Square in Ramallah this month.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/1dd24804de03_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/1dd24804de03_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Arabic storefront signs in Ramallah → Urdu Nastaliq storefront signs reading 'احتساب مخالف احتجاج' and 'راولپنڈی الیکٹرانکس'</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> Urdu commercial signage implies a Pakistani urban setting rather than Ramallah in the West Bank, contradicting the caption's claim that the scene is in Manara Square.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>ads_anachronism</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Dell-branded storefront/logo</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> A Dell storefront logo introduces a distinctly American consumer-tech brand into a captioned scene about a Palestinian anti-corruption protest in Ramallah, creating a subtle but significant cultural and temporal mismatch.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-19" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fff4d6;padding:2px 8px;border-radius:4px;border-left:3px solid #dc6;">Mode B</span> Case #19</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>ads_anachronism</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Protests Upend Mayor’s Race, and Eyes Turn to Jumaane Williams</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Jumaane Williams, center, has been one of the few elected officials in New York City to lead protests, as he did on Tuesday, across the Brooklyn Bridge.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/757a6940bf28_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/757a6940bf28_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> “Wilson” cap branding → white OpenAI knot + “ChatGPT” logo on the cap</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> A ChatGPT-branded cap is anachronistic in a photo of the 2020 George Floyd protests in New York because ChatGPT did not exist yet.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-20" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fff4d6;padding:2px 8px;border-radius:4px;border-left:3px solid #dc6;">Mode B</span> Case #20</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>flag</code> · difficulty=<code>easy</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Antivaccination Activists Are Growing Force at Virus Protests</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Heidi Muñoz Gleisner, left center, and Tara Thornton, right center, were removed from a demonstration in Sacramento on Friday.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/675bc2ec25a2_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/675bc2ec25a2_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> U.S. flag → French blue-white-red tricolour flag</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes a demonstration in Sacramento, California, so a French national flag shifts the scene's implied political setting away from a specifically American state protest.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>flag</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> French tricolor flag (blue-white-red vertical bands)</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption describes a Sacramento antivaccination protest in the United States, so a French national flag subtly shifts the implied political setting away from a local American demonstration.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-21" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fff4d6;padding:2px 8px;border-radius:4px;border-left:3px solid #dc6;">Mode B</span> Case #21</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>social_behavior</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Phoenix Police Kill Man in Parked Car, Igniting New Protests</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Protesters at a Black Lives Matter march in Phoenix last month.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/9f923b7cfccd_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/9f923b7cfccd_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Marching Black Lives Matter protest behavior → coordinated Islamic prayer postures (standing, bowing, prostrating)</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes protesters at a Black Lives Matter march in Phoenix, but the edited crowd is behaving like participants in a communal Muslim prayer gathering rather than a protest march.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>social_behavior</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> participants performing Islamic sujud prostration</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption describes protesters at a Black Lives Matter march in Phoenix, but the edited behavior makes the gathering look like a Muslim prayer ritual instead of a civil-rights protest.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-22" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fff4d6;padding:2px 8px;border-radius:4px;border-left:3px solid #dc6;">Mode B</span> Case #22</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>technology</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Myanmar Military Storms Universities and Hospitals and Revokes Press Licenses</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Police used tear gas, sound grenades and rubber bullets to crack down on demonstrations in Yangon on Monday.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/f714fb16e745_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/f714fb16e745_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Improvised protest shield → Brazilian Engesa EE-11 Urutu armored-vehicle vision block in olive-drab frame</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes demonstrators in Yangon during Myanmar’s crackdown, but a conspicuous component from a Brazilian Engesa armored vehicle subtly implies unrelated Brazilian military hardware in the scene.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>environmental</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Banyan-style tropical monsoon foliage with banyan-like leaf clusters and a small sapling</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption places the scene in Yangon, Myanmar, but the edited vegetation suggests a distinctly South Asian tropical monsoon environment associated with Banyan trees rather than the more typical urban street trees seen there.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-23" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#fff4d6;padding:2px 8px;border-radius:4px;border-left:3px solid #dc6;">Mode B</span> Case #23</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>flag</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Divided Kingdom: Jordan Shaken by Split Between King and Ex-Crown Prince</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Jordan’s capital, Amman, on Sunday, a day after the authorities moved against high-profile figures.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/301ddabbaeee_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/301ddabbaeee_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Small multicolored street banner/flag → Indonesian red-over-white flag</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption identifies the scene as Amman, Jordan, so displaying an Indonesian national flag in the street decorations subtly shifts the location/national context away from Jordan.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>infrastructure</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> South African-style black-pole speed-limit sign and bollard system</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption identifies the scene as Amman, Jordan, so replacing key roadside infrastructure with distinctly South African street hardware implies the wrong national setting.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#c00;">0.0</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <h2 id="mode-C" style="background:#e2f0fd;padding:12px 16px;border-radius:6px;border-left:5px solid #7ad;">Mode C — Misidentified entity origin (right object, wrong country/era) <span style="float:right;color:#555;font-size:14px;">7 cases</span></h2> | |
| <div class="card" id="case-24" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#e2f0fd;padding:2px 8px;border-radius:4px;border-left:3px solid #7ad;">Mode C</span> Case #24</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>technology</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> At Least Eight Killed in Haiti Prison Break</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Recaptured inmates are led by the police outside the Croix-des-Bouquets Civil Prison after an attempted breakout, in Port-au-Prince, Haiti, on Thursday.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/6ed0229454ee_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/6ed0229454ee_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Officer’s visible long gun → Brazilian IMBEL IA2 assault rifle</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes Haitian police escorting recaptured inmates, so a distinctly Brazilian service rifle subtly shifts the implied security force identity away from Haiti.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>flag</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Philippine flag patch</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption describes Haitian police recapturing inmates outside a prison in Port-au-Prince, so a Philippine flag on the officer would imply the wrong national security force.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#080;">0.8</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-25" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#e2f0fd;padding:2px 8px;border-radius:4px;border-left:3px solid #7ad;">Mode C</span> Case #25</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>technology</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Lone Wolves Connected Online: A History of Modern White Supremacy</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> In 1981, Louis Beam, then Grand Dragon of the Texas chapter of the Ku Klux Klan, inspecting armed members of the group&rsquo;s security force.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/7089935d80ec_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/7089935d80ec_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> One visible rifle in the right-side line → South African Vektor R4 assault rifle</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> A South African Vektor R4 was introduced later in the 1980s and would not fit a 1981 Texas Klan militia inspection scene described in the caption.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>technology</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Israeli IMI TAR-21 bullpup assault rifles</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption describes a 1981 Klan security force in the United States, so equipping them with Israeli-made TAR-21 rifles introduces a subtly wrong technological context tied to a later Middle Eastern military context.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#a60;">0.6</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-26" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#e2f0fd;padding:2px 8px;border-radius:4px;border-left:3px solid #7ad;">Mode C</span> Case #26</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>infrastructure</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Italy Locks Down Much of the Country’s North Over the Coronavirus</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> The Duomo Square in Milan on Sunday, after the Italian government imposed a lockdown on much of the country’s north because of the coronavirus.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/cb42a79085c6_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/cb42a79085c6_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Unmarked Milan plaza infrastructure → Addis Ababa Light Rail-style station entrance with Amharic sign "ለገሀር / Leghar"</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption identifies the location as Duomo Square in Milan, but the added transit signage and station design indicate Ethiopian Addis Ababa rail infrastructure instead of Italian urban infrastructure.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>infrastructure</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Tokyo Metro-style sign reading 'レガーレ / Leghar'</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption places the scene in Milan during Italy's coronavirus lockdown, so a Japanese transit stop sign identifying a Tokyo Metro station would be geographically inconsistent.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#080;">0.8</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-27" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#e2f0fd;padding:2px 8px;border-radius:4px;border-left:3px solid #7ad;">Mode C</span> Case #27</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>architecture</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Italy’s Vaccine Drive Runs Up Against a Sacred Institution: Summer Vacation</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Dining in Florence, Italy, this week. Pressure has built on the government to be more flexible to save the tourism season and to allow Italians to get vaccinated in sun-and-surf regions far from home.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/d8e716b48be1_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/d8e716b48be1_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Florentine cathedral dome and upper facade details → Mughal white onion dome with jali-screened cusped-arch detailing</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption places the scene in Florence, Italy, but the edited landmark would display distinctive Mughal South Asian architecture rather than Florentine historic architecture.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>CONSISTENT</code> <b>type:</b> <code><em>None</em></code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> <em>None</em></div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> <em>None</em></div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#a60;">0.6</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-28" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#e2f0fd;padding:2px 8px;border-radius:4px;border-left:3px solid #7ad;">Mode C</span> Case #28</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>ads_anachronism</code> · difficulty=<code>medium</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> Buffeted by Trump, China Has Little Hope for Warmer Relations With Biden</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> A state media broadcast in Beijing on Sunday showing President-elect Joseph R. Biden Jr. delivering his victory speech.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/4e21dac4a643_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/4e21dac4a643_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Lower-right fashion storefront sign → illuminated COVID-19 mask-and-QR-code entry notice in Chinese</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes a 2020 Biden victory-speech broadcast, but a mall sign requiring mask-wearing and QR-code registration evokes the later COVID-control period in China rather than that specific moment.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>ads_anachronism</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> COVID-19 mask-and-QR-code public-health notice</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> A COVID-era mask-and-QR-code notice is anachronistic for a 2016 election-night broadcast shown in Beijing.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#080;">0.8</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-29" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#e2f0fd;padding:2px 8px;border-radius:4px;border-left:3px solid #7ad;">Mode C</span> Case #29</h3> | |
| <span style="color:#666;font-size:13px;">test_id_edit · gt_type=<code>technology</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> A Lucky Country Says Goodbye to the World’s Longest Boom</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> A restaurant in Sydney last week.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/b766dd461c40_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/b766dd461c40_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Sydney skyline construction crane → South African Ratel 20 armored vehicle</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> A South African military armored vehicle appearing as construction equipment in a restaurant scene identified as Sydney last week conflicts with the expected civilian urban infrastructure of contemporary Australia.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>technology</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> Turkish Anka-style unmanned turret-equipped armored vehicle</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> The caption describes a restaurant scene in Sydney, so suspending a Turkish military unmanned turret vehicle over the harbor conflicts with the expected civilian setting and changes the implied news context.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#a60;">0.6</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <div class="card" id="case-30" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;"> | |
| <div style="display:flex;justify-content:space-between;align-items:baseline;border-bottom:1px solid #eee;padding-bottom:6px;margin-bottom:10px;"> | |
| <h3 style="margin:0;"><span style="background:#e2f0fd;padding:2px 8px;border-radius:4px;border-left:3px solid #7ad;">Mode C</span> Case #30</h3> | |
| <span style="color:#666;font-size:13px;">test_ood_edit · gt_type=<code>ads_anachronism</code> · difficulty=<code>hard</code></span> | |
| </div> | |
| <div style="font-size:13px;color:#444;margin-bottom:6px;"><b>Headline:</b> After Breonna Taylor’s Death, Black Engagement in Kentucky Politics Soared</div> | |
| <div style="font-size:13px;color:#444;margin-bottom:10px;"><b>Caption:</b> Protesters gathered in downtown Louisville, Ky., on Saturday to commemorate anniversary of the killing of Breonna Taylor in a botched raid by Louisville police officers.</div> | |
| <div style="display:flex;gap:10px;margin:10px 0;"> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Original (reference only)</div><img src="images/f33b88aa8794_jumbo.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| <div style="flex:1;"><div style="font-size:12px;color:#888;text-align:center;">Edited (what MIC saw)</div><img src="images/f33b88aa8794_jumbo_edited.png" style="width:100%;border:1px solid #ccc;border-radius:4px;"></div> | |
| </div> | |
| <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;font-size:14px;"> | |
| <div style="background:#f6f6f6;padding:10px;border-radius:6px;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">Ground truth</div> | |
| <div><b>what_changed:</b> Camera 'Sony' branding and cap 'NewsShare' text → COVID-era 'SCAN FOR CONTACT TRACING' QR sticker and 'Mask Up Louisville' text</div> | |
| <div style="margin-top:4px;"><b>why_contradicts:</b> The caption describes a Louisville protest commemorating Breonna Taylor with visible masks already placing it in the pandemic period, so adding a formal contact-tracing QR code campaign and 'Mask Up Louisville' branding injects a very specific later-pandemic public-health visual program that does not fit the event as depicted.</div> | |
| </div> | |
| <div style="background:#fafdff;padding:10px;border-radius:6px;border-left:3px solid #58c;"> | |
| <div style="font-weight:bold;color:#333;margin-bottom:6px;">MIC prediction</div> | |
| <div><b>verdict:</b> <code>INCONSISTENT</code> <b>type:</b> <code>ads_anachronism</code></div> | |
| <div style="margin-top:4px;"><b><grounding>:</b> COVID-era QR-code 'SCAN FOR CONTACT TRACING' plate</div> | |
| <div style="margin-top:4px;"><b><knowledge>:</b> A COVID contact-tracing QR plate introduces a pandemic-era public-health cue that conflicts with the captioned 2020 protest commemorating Breonna Taylor.</div> | |
| <div style="margin-top:8px;font-size:12px;"> | |
| knowledge_score = <b style="color:#c00;">0.2</b> | | |
| grounding_score = <b style="color:#080;">0.8</b> | |
| </div> | |
| </div> | |
| </div> | |
| <textarea placeholder="Your notes / disagreement with the auto-tag..." style="width:100%;margin-top:10px;min-height:50px;border:1px dashed #bbb;border-radius:4px;padding:6px;font-family:inherit;font-size:13px;"></textarea> | |
| </div> | |
| <p style="margin-top:40px;color:#888;font-size:12px;">Generated from <code>/tmp/mic_errors_30.json</code> · source data in <code>experiments/eval/results/scored/Ours-SFT-GRPO__*__canonical.jsonl</code></p> | |
| </body></html> | |