Spaces:

HuggingAI4Engineering
/

cadgenbench-leaderboard

Running

Michael Rabinovich commited on 9 days ago

Commit

4e86f82

1 Parent(s): f2f35be

app+requirements: pin Gradio 5 + gradio_leaderboard, auto-refresh now works

Step 6 (E) chunk 5, take 2. The Gradio 6.14 gr.Dataframe component
silently swallows row updates after the first render: server-side
load_leaderboard fires every N seconds (logged: "Loaded 11 rows
from Hub" on each tick) but the rendered DOM never picks the new
rows up. Confirmed with a local row-growing fixture (load_leaderboard
monkey-patched to return N+1 rows on each call); the server fired
10 times, the DOM stuck on the first call's row count.

Tried in order, all failed against the local fixture:
- gr.Dataframe(every=10, value=callable)
- gr.Timer(10).tick(outputs=gr.Dataframe)
- manual Refresh button click
- @gr .render(inputs=[gr.State, gr.State], key=f"...-{tick}")
- @gr .render(inputs=[hidden gr.Textbox], key=f"...-{tick}")
- streaming generator on app.load (first yield delivered; every
subsequent yield ignored client-side)

What actually works: pin to Gradio 5 (latest 5.50.0) and use the
gradio_leaderboard.Leaderboard custom component, which has its own
update path and is the component every shipping HF leaderboard
(open-llm-leaderboard, DABstep, bigcodebench) actually uses.
gradio_leaderboard pins gradio<6.0,>=4.0 so there is no Gradio 6
build today; revisit when one ships AND the underlying Dataframe-
update bug is fixed upstream.

Local verification: row-growing fixture goes 5 -> 10 in the DOM
across 50s (one new row per 10s Timer tick), screenshot confirmed.

Bonus: Leaderboard component ships a free search box across
submission_name + submitter_name; nothing else changes
behaviourally (still our load_leaderboard producing a pandas
DataFrame, still our status/score formatters, still our
auto-refresh cadence + manual Refresh button).

Outstanding follow-up: file a minimal-repro gradio issue against
gr.Dataframe in 6.14 so this pin gets revisited when a fix lands.

Files changed (2) hide show

app.py +11 -36
requirements.txt +11 -1

app.py CHANGED Viewed

@@ -6,9 +6,9 @@ Read path lives in :mod:`leaderboard`. Submit-tab validation lives in
 from __future__ import annotations
 import logging
-import time
 import gradio as gr
 from leaderboard import (
     HF_DATA_REPO,
@@ -50,43 +50,13 @@ with gr.Blocks(title="CADGenBench Leaderboard") as app:
     )
     with gr.Tab("Leaderboard"):
-        # Gradio 6's gr.Dataframe identity-checks its component id and
-        # short-circuits the diff when the new value's shape + column
-        # structure look unchanged, even though the row data differs.
-        # That swallows updates from every= on the Dataframe AND from
-        # gr.Timer().tick(outputs=df_view) AND from a manual refresh
-        # button click. The fix is @gr.render with a `key=` that
-        # changes on every tick: Gradio tears down and rebuilds the
-        # component subtree in place, picking up the fresh value.
-        #
-        # The fetch happens once per tick in `_refresh_table` (server
-        # side) and the result rides on gr.State to all subscribers,
-        # so N concurrent viewers don't cause N HTTPS GETs per tick.
-        table_state = gr.State(value=load_leaderboard())
-        tick_state = gr.State(value=0)
-        @gr.render(inputs=[table_state, tick_state])
-        def render_leaderboard(df, t: int) -> None:
-            gr.Dataframe(
-                value=df,
-                interactive=False,
-                wrap=True,
-                label="Results (sorted by aggregate CAD score)",
-                key=f"leaderboard-df-{t}",
-            )
-        def _refresh_table():
-            # ms-resolution so rapid clicks always increment the key.
-            return load_leaderboard(), int(time.time() * 1000)
-        auto_refresh_timer = gr.Timer(10)
-        auto_refresh_timer.tick(
-            fn=_refresh_table, outputs=[table_state, tick_state],
         )
         refresh_btn = gr.Button("Refresh", size="sm")
-        refresh_btn.click(
-            fn=_refresh_table, outputs=[table_state, tick_state],
-        )
     with gr.Tab("Submit"):
         gr.Markdown(
@@ -131,6 +101,11 @@ to publish the resulting row on the public leaderboard.
     with gr.Tab("About"):
         gr.Markdown(ABOUT_MD)
 if __name__ == "__main__":
     app.launch(theme=gr.themes.Soft())

 from __future__ import annotations
 import logging
 import gradio as gr
+from gradio_leaderboard import Leaderboard
 from leaderboard import (
     HF_DATA_REPO,
     )
     with gr.Tab("Leaderboard"):
+        df_view = Leaderboard(
+            value=load_leaderboard(),
+            search_columns=["submission_name", "submitter_name"],
+            label="Results (sorted by aggregate CAD score)",
         )
         refresh_btn = gr.Button("Refresh", size="sm")
+        refresh_btn.click(fn=load_leaderboard, outputs=df_view)
     with gr.Tab("Submit"):
         gr.Markdown(
     with gr.Tab("About"):
         gr.Markdown(ABOUT_MD)
+    # gradio_leaderboard.Leaderboard handles its own update path
+    # cleanly; bind a Timer to push a fresh dataframe every 10 seconds.
+    auto_refresh_timer = gr.Timer(10)
+    auto_refresh_timer.tick(fn=load_leaderboard, outputs=df_view)
 if __name__ == "__main__":
     app.launch(theme=gr.themes.Soft())

requirements.txt CHANGED Viewed

@@ -1,4 +1,14 @@
-gradio==6.14.0
 pandas>=2.0
 huggingface_hub>=0.27.0
 datasets>=3.0

+# Pinned to Gradio 5.x because gr.Dataframe in Gradio 6.14 silently
+# drops all updates after the first render: server-side load_leaderboard
+# runs (and yields under demo.load), but the rendered DOM never picks
+# up the new rows. Reproduced locally with a row-growing fixture, then
+# verified to work cleanly with gradio_leaderboard.Leaderboard on
+# Gradio 5. The reference HF leaderboards (open-llm-leaderboard,
+# DABstep, bigcodebench) all run on this stack. Revisit once a
+# Gradio 6-compatible gradio_leaderboard ships AND the underlying
+# Dataframe-update bug is fixed upstream.
+gradio==5.50.0
+gradio-leaderboard==0.0.14
 pandas>=2.0
 huggingface_hub>=0.27.0
 datasets>=3.0