Spaces:
Running
Running
Upload index.html with huggingface_hub
Browse files- index.html +3 -0
index.html
CHANGED
|
@@ -10,6 +10,9 @@ a{color:#36a;text-decoration:none;} a:hover{text-decoration:underline;}
|
|
| 10 |
<body>
|
| 11 |
<h1>MIC (Ours-SFT-GRPO) — Error Analysis · 30 cases</h1>
|
| 12 |
<p style="color:#555;">All 30 cases are <b>genuine MIC errors</b> (no schema-only disagreements). Sampled from <code>test_id_edit</code> and <code>test_ood_edit</code> (canonical prompt). Pre-grouped by 3 large failure modes — disagree freely in the notes box on each card.</p>
|
|
|
|
|
|
|
|
|
|
| 13 |
<div class="nav"><b>Jump to:</b> <a href="#mode-A">Mode A (15)</a> · <a href="#mode-B">Mode B (8)</a> · <a href="#mode-C">Mode C (7)</a> · <span style="color:#888;">A = verdict miss · B = fabricated evidence (g_score < 0.3) · C = wrong attribution (g_score ≥ 0.5)</span></div>
|
| 14 |
<h2 id="mode-A" style="background:#fde2e2;padding:12px 16px;border-radius:6px;border-left:5px solid #e88;">Mode A — Perceptually subtle / locally-plausible edits (verdict miss) <span style="float:right;color:#555;font-size:14px;">15 cases</span></h2>
|
| 15 |
<div class="card" id="case-1" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;">
|
|
|
|
| 10 |
<body>
|
| 11 |
<h1>MIC (Ours-SFT-GRPO) — Error Analysis · 30 cases</h1>
|
| 12 |
<p style="color:#555;">All 30 cases are <b>genuine MIC errors</b> (no schema-only disagreements). Sampled from <code>test_id_edit</code> and <code>test_ood_edit</code> (canonical prompt). Pre-grouped by 3 large failure modes — disagree freely in the notes box on each card.</p>
|
| 13 |
+
<div style="background:#fff8e1;border:1px solid #f0c970;border-radius:6px;padding:10px 14px;margin:10px 0 20px 0;font-size:14px;color:#444;">
|
| 14 |
+
<b>How to read each card.</b> MIC was given <b>only the edited (right) image</b> + the caption; the original (left) is shown for human comparison only. Ground truth verdict for every case is <code>INCONSISTENT</code> (the image is edited). The right-hand panel shows MIC's <code><verdict> / <type> / <grounding> / <knowledge></code> output on the edited image. The error is whichever part diverges from the ground truth on the left.
|
| 15 |
+
</div>
|
| 16 |
<div class="nav"><b>Jump to:</b> <a href="#mode-A">Mode A (15)</a> · <a href="#mode-B">Mode B (8)</a> · <a href="#mode-C">Mode C (7)</a> · <span style="color:#888;">A = verdict miss · B = fabricated evidence (g_score < 0.3) · C = wrong attribution (g_score ≥ 0.5)</span></div>
|
| 17 |
<h2 id="mode-A" style="background:#fde2e2;padding:12px 16px;border-radius:6px;border-left:5px solid #e88;">Mode A — Perceptually subtle / locally-plausible edits (verdict miss) <span style="float:right;color:#555;font-size:14px;">15 cases</span></h2>
|
| 18 |
<div class="card" id="case-1" style="border:1px solid #ddd;border-radius:8px;margin:18px 0;padding:14px;background:#fff;">
|