MedGRPO Team Claude Opus 4.7 (1M context) commited on
Commit Β·
3bf5e65
1
Parent(s): 892e0a6
Clarify LLM Judge is unavailable in Evaluate Only mode
Browse filesStep 1 already runs with --skip-llm-judge, and Step 2 (LLM Judge) requires
a published leaderboard row to look up the model and write scores back β
neither of which exist in Evaluate Only. So DVC_llm / VS_llm / RC_llm
will always read as 0.0 in eval-only runs. Surface this so users
expecting captioning scores don't get confused:
- Inline info text on the Evaluate Only checkbox.
- Blockquoted "Heads up" note in the eval-only success message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
app.py
CHANGED
|
@@ -1637,6 +1637,11 @@ To publish, submit again with **"Evaluate Only"** unchecked.
|
|
| 1637 |
|
| 1638 |
if eval_only:
|
| 1639 |
success_msg += f"\n### π Would-Be Ranking\n**Rank**: #{rank} out of {total} models *(if published)*\n"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1640 |
success_msg += "\nNothing has been saved. Submit again with Evaluate Only unchecked to publish."
|
| 1641 |
else:
|
| 1642 |
success_msg += f"\n### π Ranking\n**Rank**: #{rank} out of {total} models\n"
|
|
@@ -2452,7 +2457,7 @@ with gr.Blocks(title="MedVidBench Leaderboard", theme=gr.themes.Soft()) as demo:
|
|
| 2452 |
eval_only_checkbox = gr.Checkbox(
|
| 2453 |
label="Evaluate Only (don't publish to the leaderboard)",
|
| 2454 |
value=False,
|
| 2455 |
-
info="Run evaluation and see your would-be rank without adding a row to the public table. Disabled for ECCV submissions."
|
| 2456 |
)
|
| 2457 |
|
| 2458 |
model_name_input = gr.Textbox(
|
|
|
|
| 1637 |
|
| 1638 |
if eval_only:
|
| 1639 |
success_msg += f"\n### π Would-Be Ranking\n**Rank**: #{rank} out of {total} models *(if published)*\n"
|
| 1640 |
+
success_msg += (
|
| 1641 |
+
"\n> **Heads up:** LLM-judge metrics (DVC_llm, VS_llm, RC_llm) are **0.0** in Evaluate Only mode "
|
| 1642 |
+
"because Step 2 (LLM Judge) requires a published row to read predictions from. "
|
| 1643 |
+
"Submit normally (Evaluate Only unchecked) when you want those scored.\n"
|
| 1644 |
+
)
|
| 1645 |
success_msg += "\nNothing has been saved. Submit again with Evaluate Only unchecked to publish."
|
| 1646 |
else:
|
| 1647 |
success_msg += f"\n### π Ranking\n**Rank**: #{rank} out of {total} models\n"
|
|
|
|
| 2457 |
eval_only_checkbox = gr.Checkbox(
|
| 2458 |
label="Evaluate Only (don't publish to the leaderboard)",
|
| 2459 |
value=False,
|
| 2460 |
+
info="Run evaluation and see your would-be rank without adding a row to the public table. LLM-judge metrics (DVC_llm, VS_llm, RC_llm) cannot be computed in this mode β they require a published row. Disabled for ECCV submissions."
|
| 2461 |
)
|
| 2462 |
|
| 2463 |
model_name_input = gr.Textbox(
|