MedGRPO Team Claude Opus 4.7 (1M context) commited on
Commit
3bf5e65
Β·
1 Parent(s): 892e0a6

Clarify LLM Judge is unavailable in Evaluate Only mode

Browse files

Step 1 already runs with --skip-llm-judge, and Step 2 (LLM Judge) requires
a published leaderboard row to look up the model and write scores back β€”
neither of which exist in Evaluate Only. So DVC_llm / VS_llm / RC_llm
will always read as 0.0 in eval-only runs. Surface this so users
expecting captioning scores don't get confused:

- Inline info text on the Evaluate Only checkbox.
- Blockquoted "Heads up" note in the eval-only success message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +6 -1
app.py CHANGED
@@ -1637,6 +1637,11 @@ To publish, submit again with **"Evaluate Only"** unchecked.
1637
 
1638
  if eval_only:
1639
  success_msg += f"\n### πŸ† Would-Be Ranking\n**Rank**: #{rank} out of {total} models *(if published)*\n"
 
 
 
 
 
1640
  success_msg += "\nNothing has been saved. Submit again with Evaluate Only unchecked to publish."
1641
  else:
1642
  success_msg += f"\n### πŸ† Ranking\n**Rank**: #{rank} out of {total} models\n"
@@ -2452,7 +2457,7 @@ with gr.Blocks(title="MedVidBench Leaderboard", theme=gr.themes.Soft()) as demo:
2452
  eval_only_checkbox = gr.Checkbox(
2453
  label="Evaluate Only (don't publish to the leaderboard)",
2454
  value=False,
2455
- info="Run evaluation and see your would-be rank without adding a row to the public table. Disabled for ECCV submissions."
2456
  )
2457
 
2458
  model_name_input = gr.Textbox(
 
1637
 
1638
  if eval_only:
1639
  success_msg += f"\n### πŸ† Would-Be Ranking\n**Rank**: #{rank} out of {total} models *(if published)*\n"
1640
+ success_msg += (
1641
+ "\n> **Heads up:** LLM-judge metrics (DVC_llm, VS_llm, RC_llm) are **0.0** in Evaluate Only mode "
1642
+ "because Step 2 (LLM Judge) requires a published row to read predictions from. "
1643
+ "Submit normally (Evaluate Only unchecked) when you want those scored.\n"
1644
+ )
1645
  success_msg += "\nNothing has been saved. Submit again with Evaluate Only unchecked to publish."
1646
  else:
1647
  success_msg += f"\n### πŸ† Ranking\n**Rank**: #{rank} out of {total} models\n"
 
2457
  eval_only_checkbox = gr.Checkbox(
2458
  label="Evaluate Only (don't publish to the leaderboard)",
2459
  value=False,
2460
+ info="Run evaluation and see your would-be rank without adding a row to the public table. LLM-judge metrics (DVC_llm, VS_llm, RC_llm) cannot be computed in this mode β€” they require a published row. Disabled for ECCV submissions."
2461
  )
2462
 
2463
  model_name_input = gr.Textbox(