Spaces:
Running
Running
metadata
title: MIC Error Analysis
emoji: π
colorFrom: red
colorTo: yellow
sdk: static
pinned: false
MIC Error Analysis β 30 cases
Interactive viewer for 30 sampled errors of the MIC model (Ours-SFT-GRPO) on the TARABench test splits, grouped into three failure modes:
- Mode A β Perceptually subtle / locally-plausible edits (verdict miss)
- Mode B β Hallucinated visual grounding (verdict right, evidence fabricated)
- Mode C β Misidentified entity origin (right object, wrong country/era)
Open index.html for the interactive viewer.