Spaces:
Running
Running
Update miniapp_leaderboard.py
Browse files- miniapp_leaderboard.py +24 -9
miniapp_leaderboard.py
CHANGED
|
@@ -162,13 +162,21 @@ def submit(model_name, model_family, zip_file, profile: gr.OAuthProfile):
|
|
| 162 |
with gr.Blocks(title=f"{APP_NAME} leaderboard") as demo:
|
| 163 |
gr.Markdown(f"# {APP_NAME} Leaderboard")
|
| 164 |
gr.Markdown("""
|
| 165 |
-
##
|
| 166 |
|
| 167 |
MiniAppBench is the first comprehensive benchmark designed to evaluate principle-driven, interactive application generation. Unlike prior benchmarks that emphasize static UI layouts or isolated algorithmic code snippets, MiniAppBench targets **MiniApps**—HTML-based applications that require both faithful visual rendering and non-trivial interaction logic.
|
| 168 |
|
| 169 |
The dataset is split into two subsets: **validation (100 instances)** and **test (400 instances)**, and can be accessed at **[MiniAppBench dataset](https://huggingface.co/datasets/MiniAppBench/Dataset)**. The **validation** set includes publicly available **evaluation references** to support reproducible experiments, while the **test** set keeps the references hidden to enable unbiased evaluation.
|
| 170 |
""")
|
| 171 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
leaderboard = gr.Dataframe(
|
| 173 |
value=pd.DataFrame(columns=COLUMNS), # 启动不访问Hub
|
| 174 |
interactive=False,
|
|
@@ -185,16 +193,23 @@ with gr.Blocks(title=f"{APP_NAME} leaderboard") as demo:
|
|
| 185 |
|
| 186 |
gr.Markdown(
|
| 187 |
"""
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 196 |
)
|
| 197 |
|
|
|
|
|
|
|
| 198 |
model_name = gr.Textbox(label="Model name", placeholder="e.g. MyModel v1")
|
| 199 |
model_family = gr.Textbox(label="Model family", placeholder="e.g. Llama / Qwen / InternLM ...")
|
| 200 |
zip_file = gr.File(label="Upload zip (.zip only)", file_types=[".zip"])
|
|
|
|
| 162 |
with gr.Blocks(title=f"{APP_NAME} leaderboard") as demo:
|
| 163 |
gr.Markdown(f"# {APP_NAME} Leaderboard")
|
| 164 |
gr.Markdown("""
|
| 165 |
+
## Data
|
| 166 |
|
| 167 |
MiniAppBench is the first comprehensive benchmark designed to evaluate principle-driven, interactive application generation. Unlike prior benchmarks that emphasize static UI layouts or isolated algorithmic code snippets, MiniAppBench targets **MiniApps**—HTML-based applications that require both faithful visual rendering and non-trivial interaction logic.
|
| 168 |
|
| 169 |
The dataset is split into two subsets: **validation (100 instances)** and **test (400 instances)**, and can be accessed at **[MiniAppBench dataset](https://huggingface.co/datasets/MiniAppBench/Dataset)**. The **validation** set includes publicly available **evaluation references** to support reproducible experiments, while the **test** set keeps the references hidden to enable unbiased evaluation.
|
| 170 |
""")
|
| 171 |
|
| 172 |
+
gr.Markdown(
|
| 173 |
+
"""
|
| 174 |
+
## Leaderboard
|
| 175 |
+
|
| 176 |
+
All results shown on this leaderboard are evaluated on the **test split** of MiniAppBench.
|
| 177 |
+
""",
|
| 178 |
+
)
|
| 179 |
+
|
| 180 |
leaderboard = gr.Dataframe(
|
| 181 |
value=pd.DataFrame(columns=COLUMNS), # 启动不访问Hub
|
| 182 |
interactive=False,
|
|
|
|
| 193 |
|
| 194 |
gr.Markdown(
|
| 195 |
"""
|
| 196 |
+
**Submission requirements**
|
| 197 |
+
- Please **sign in with Hugging Face** before submitting.
|
| 198 |
+
- **One submission per user per day (UTC)**.
|
| 199 |
+
- Upload a **.zip** file only.
|
| 200 |
+
- The `.zip` must contain the HTML outputs for the **test set queries**.
|
| 201 |
+
- Each file should be named using the query index: `<index>.html` (e.g., `1.html`, `2.html`, ...).
|
| 202 |
+
- We may contact you via email for verification and request additional materials. Please be prepared to provide:
|
| 203 |
+
- **Model access** (one of the following):
|
| 204 |
+
- Preferred: an **inference API endpoint** we can use to reproduce the results.
|
| 205 |
+
- Alternatively: **model checkpoints (ckpts)** plus clear **deployment / inference instructions** (environment, dependencies, and how to run).
|
| 206 |
+
- **A related paper**, if available (e.g., an **arXiv link** or a PDF).
|
| 207 |
+
- After you submit, we will update the results within **3 days**.
|
| 208 |
+
""",
|
| 209 |
)
|
| 210 |
|
| 211 |
+
|
| 212 |
+
|
| 213 |
model_name = gr.Textbox(label="Model name", placeholder="e.g. MyModel v1")
|
| 214 |
model_family = gr.Textbox(label="Model family", placeholder="e.g. Llama / Qwen / InternLM ...")
|
| 215 |
zip_file = gr.File(label="Upload zip (.zip only)", file_types=[".zip"])
|