Add scope section: full Video-MME mini 2700Q result (+0.22 pp)

Following the 300Q release, the eval was extended to the full 2700Q split. Overall Δ +0.22 pp. README adds: (1) Scope note callout under TL;DR, (2) updated Caveat in Reproducibility section pointing to (3) new 'Scope on the full Video-MME mini (2700Q)' section. The original 300Q numbers are unchanged and remain reproducible by recipe; this addition characterizes the design envelope (short-clip, low-frame-budget) on the full balanced split.

Files changed (1) hide show

README.md +21 -5

README.md CHANGED Viewed

@@ -62,6 +62,11 @@ product-specific CCTV model.
 **12 of 12 task buckets non-negative; 8 strongly positive (≥ 5 pp);
 0 regressions** in task-aware MCQ mode (task_type from Video-MME dataset).
 ## Why it works
 Stock Qwen3-VL-2B at 8 frames lags itself at 64 frames by ~17 pp.
@@ -235,11 +240,22 @@ This artifact is **fully deterministic** at greedy decoding —
 re-running on the same 300 questions reproduces the same 199 / 300 = 66.3 %
 in task-aware MCQ mode.
-> **Caveat — sample size and split.** All numbers above are on the
-> Video-MME *mini* split (the 300 questions whose videos ship in
-> `videos_chunked_01.zip`). They are **not** the full 2700-question
-> Video-MME benchmark and are not a leaderboard submission. A full-
-> benchmark eval is on the future-work list.
 ## Acknowledgements / Related Work

 **12 of 12 task buckets non-negative; 8 strongly positive (≥ 5 pp);
 0 regressions** in task-aware MCQ mode (task_type from Video-MME dataset).
+> **Scope note.** This method targets short-clip, low-frame-budget
+> video QA. The 300 Q numbers above are inside that design envelope.
+> On the full 2700 Q split, overall Δ is **+0.22 pp** — see
+> [Scope on the full Video-MME mini (2700 Q)](#scope-on-the-full-video-mme-mini-2700-q) below.
 ## Why it works
 Stock Qwen3-VL-2B at 8 frames lags itself at 64 frames by ~17 pp.
 re-running on the same 300 questions reproduces the same 199 / 300 = 66.3 %
 in task-aware MCQ mode.
+> **Caveat — sample size and split.** The 300 Q numbers above are on
+> the `videos_chunked_01.zip` mini subset, which happens to be mostly
+> short clips. For full-split numbers on Video-MME mini 2700 Q
+> (balanced short / medium / long), see
+> [Scope on the full Video-MME mini (2700 Q)](#scope-on-the-full-video-mme-mini-2700-q)
+> below. This release is not a leaderboard submission.
+## Scope on the full Video-MME mini (2700 Q)
+After the 300 Q release, the eval was extended to the full 2700 Q
+split (MCQ mode without `task_type`). Stock 53.11 %, QueryFrames
+53.33 %, **Δ +0.22 pp**.
+This method targets short-clip, low-frame-budget video QA. The
+2700 Q split is balanced across short / medium / long-form clips;
+averaging across that range dilutes the gain to roughly neutral.
 ## Acknowledgements / Related Work