zengrh3 commited on
Commit
481d227
·
verified ·
1 Parent(s): 64286bc

Upload index.html with huggingface_hub

Browse files
Files changed (1) hide show
  1. index.html +3 -0
index.html CHANGED
@@ -10,6 +10,9 @@ a{color:#36a;text-decoration:none;} a:hover{text-decoration:underline;}
10
  <body>
11
  <h1>MIC (Ours-SFT-GRPO) — Error Analysis · 30 cases</h1>
12
  <p style="color:#555;">All 30 cases are <b>genuine MIC errors</b> (no schema-only disagreements). Sampled from <code>test_id_edit</code> and <code>test_ood_edit</code> (canonical prompt). Pre-grouped by 3 large failure modes — disagree freely in the notes box on each card.</p>
 
 
 
13
  <div class="nav"><b>Jump to:</b> &nbsp; <a href="#mode-A">Mode A (15)</a> &nbsp;·&nbsp; <a href="#mode-B">Mode B (8)</a> &nbsp;·&nbsp; <a href="#mode-C">Mode C (7)</a> &nbsp; · &nbsp; <span style="color:#888;">A = verdict miss · B = fabricated evidence (g_score &lt; 0.3) · C = wrong attribution (g_score ≥ 0.5)</span></div>
14
  <h2 id="mode-A" style="background:#fde2e2;padding:12px 16px;border-radius:6px;border-left:5px solid #e88;">Mode A — Perceptually subtle / locally-plausible edits (verdict miss) <span style="float:right;color:#555;font-size:14px;">15 cases</span></h2>
15
  <div class="card" id="case-1" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;">
 
10
  <body>
11
  <h1>MIC (Ours-SFT-GRPO) — Error Analysis · 30 cases</h1>
12
  <p style="color:#555;">All 30 cases are <b>genuine MIC errors</b> (no schema-only disagreements). Sampled from <code>test_id_edit</code> and <code>test_ood_edit</code> (canonical prompt). Pre-grouped by 3 large failure modes — disagree freely in the notes box on each card.</p>
13
+ <div style="background:#fff8e1;border:1px solid #f0c970;border-radius:6px;padding:10px 14px;margin:10px 0 20px 0;font-size:14px;color:#444;">
14
+ <b>How to read each card.</b> MIC was given <b>only the edited (right) image</b> + the caption; the original (left) is shown for human comparison only. Ground truth verdict for every case is <code>INCONSISTENT</code> (the image is edited). The right-hand panel shows MIC's <code>&lt;verdict&gt; / &lt;type&gt; / &lt;grounding&gt; / &lt;knowledge&gt;</code> output on the edited image. The error is whichever part diverges from the ground truth on the left.
15
+ </div>
16
  <div class="nav"><b>Jump to:</b> &nbsp; <a href="#mode-A">Mode A (15)</a> &nbsp;·&nbsp; <a href="#mode-B">Mode B (8)</a> &nbsp;·&nbsp; <a href="#mode-C">Mode C (7)</a> &nbsp; · &nbsp; <span style="color:#888;">A = verdict miss · B = fabricated evidence (g_score &lt; 0.3) · C = wrong attribution (g_score ≥ 0.5)</span></div>
17
  <h2 id="mode-A" style="background:#fde2e2;padding:12px 16px;border-radius:6px;border-left:5px solid #e88;">Mode A — Perceptually subtle / locally-plausible edits (verdict miss) <span style="float:right;color:#555;font-size:14px;">15 cases</span></h2>
18
  <div class="card" id="case-1" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;">