mic-error-analysis / README.md
zengrh3's picture
Upload folder using huggingface_hub
64286bc verified
metadata
title: MIC Error Analysis
emoji: πŸ”
colorFrom: red
colorTo: yellow
sdk: static
pinned: false

MIC Error Analysis β€” 30 cases

Interactive viewer for 30 sampled errors of the MIC model (Ours-SFT-GRPO) on the TARABench test splits, grouped into three failure modes:

  • Mode A β€” Perceptually subtle / locally-plausible edits (verdict miss)
  • Mode B β€” Hallucinated visual grounding (verdict right, evidence fabricated)
  • Mode C β€” Misidentified entity origin (right object, wrong country/era)

Open index.html for the interactive viewer.