Spaces:

UII-AI
/

MedVidBench-Leaderboard

Running

MedGRPO Team Claude Opus 4.7 (1M context) commited on 3 days ago

Commit

3bf5e65

1 Parent(s): 892e0a6

Clarify LLM Judge is unavailable in Evaluate Only mode

Step 1 already runs with --skip-llm-judge, and Step 2 (LLM Judge) requires
a published leaderboard row to look up the model and write scores back —
neither of which exist in Evaluate Only. So DVC_llm / VS_llm / RC_llm
will always read as 0.0 in eval-only runs. Surface this so users
expecting captioning scores don't get confused:

- Inline info text on the Evaluate Only checkbox.
- Blockquoted "Heads up" note in the eval-only success message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

app.py +6 -1

app.py CHANGED Viewed

@@ -1637,6 +1637,11 @@ To publish, submit again with **"Evaluate Only"** unchecked.
     if eval_only:
         success_msg += f"\n### 🏆 Would-Be Ranking\n**Rank**: #{rank} out of {total} models *(if published)*\n"
         success_msg += "\nNothing has been saved. Submit again with Evaluate Only unchecked to publish."
     else:
         success_msg += f"\n### 🏆 Ranking\n**Rank**: #{rank} out of {total} models\n"
@@ -2452,7 +2457,7 @@ with gr.Blocks(title="MedVidBench Leaderboard", theme=gr.themes.Soft()) as demo:
                     eval_only_checkbox = gr.Checkbox(
                         label="Evaluate Only (don't publish to the leaderboard)",
                         value=False,
-                        info="Run evaluation and see your would-be rank without adding a row to the public table. Disabled for ECCV submissions."
                     )
                     model_name_input = gr.Textbox(

     if eval_only:
         success_msg += f"\n### 🏆 Would-Be Ranking\n**Rank**: #{rank} out of {total} models *(if published)*\n"
+        success_msg += (
+            "\n> **Heads up:** LLM-judge metrics (DVC_llm, VS_llm, RC_llm) are **0.0** in Evaluate Only mode "
+            "because Step 2 (LLM Judge) requires a published row to read predictions from. "
+            "Submit normally (Evaluate Only unchecked) when you want those scored.\n"
+        )
         success_msg += "\nNothing has been saved. Submit again with Evaluate Only unchecked to publish."
     else:
         success_msg += f"\n### 🏆 Ranking\n**Rank**: #{rank} out of {total} models\n"
                     eval_only_checkbox = gr.Checkbox(
                         label="Evaluate Only (don't publish to the leaderboard)",
                         value=False,
+                        info="Run evaluation and see your would-be rank without adding a row to the public table. LLM-judge metrics (DVC_llm, VS_llm, RC_llm) cannot be computed in this mode — they require a published row. Disabled for ECCV submissions."
                     )
                     model_name_input = gr.Textbox(