Upload app.py with huggingface_hub
Browse files
app.py
CHANGED
|
@@ -341,51 +341,57 @@ with gr.Blocks(
|
|
| 341 |
outputs=[omni_table, image_table, video_table, audio_table],
|
| 342 |
)
|
| 343 |
|
| 344 |
-
|
|
|
|
| 345 |
"""
|
| 346 |
<div class="overall-definition">
|
| 347 |
-
|
| 348 |
<h3>📊 Overall Score Definition</h3>
|
| 349 |
|
| 350 |
<p>
|
| 351 |
-
|
| 352 |
-
|
| 353 |
-
|
| 354 |
</p>
|
| 355 |
|
| 356 |
<p><b>1. OmniLLM / MLLM</b><br>
|
| 357 |
-
|
| 358 |
</p>
|
| 359 |
|
| 360 |
<p><b>2. Image Generation</b><br>
|
| 361 |
-
|
| 362 |
-
|
| 363 |
-
|
| 364 |
</p>
|
| 365 |
|
| 366 |
-
<p>
|
| 367 |
-
|
| 368 |
</p>
|
|
|
|
|
|
|
|
|
|
| 369 |
|
| 370 |
-
|
| 371 |
-
|
| 372 |
-
|
| 373 |
-
|
| 374 |
-
|
|
|
|
|
|
|
| 375 |
|
|
|
|
|
|
|
|
|
|
| 376 |
<p>
|
| 377 |
-
|
| 378 |
-
|
| 379 |
</p>
|
| 380 |
|
| 381 |
<p><b>3. Video Generation</b><br>
|
| 382 |
-
|
| 383 |
-
|
| 384 |
</p>
|
| 385 |
-
|
| 386 |
</div>
|
| 387 |
-
"""
|
| 388 |
-
unsafe_allow_html=True,
|
| 389 |
)
|
| 390 |
|
| 391 |
demo.launch()
|
|
|
|
| 341 |
outputs=[omni_table, image_table, video_table, audio_table],
|
| 342 |
)
|
| 343 |
|
| 344 |
+
# ---------- Overall definition (bottom) ----------
|
| 345 |
+
gr.HTML(
|
| 346 |
"""
|
| 347 |
<div class="overall-definition">
|
|
|
|
| 348 |
<h3>📊 Overall Score Definition</h3>
|
| 349 |
|
| 350 |
<p>
|
| 351 |
+
To facilitate clearer and more consistent comparison across models, we introduce an
|
| 352 |
+
<b>Overall</b> score for each leaderboard track. The aggregation strategy is tailored
|
| 353 |
+
to the evaluation protocol of each task category:
|
| 354 |
</p>
|
| 355 |
|
| 356 |
<p><b>1. OmniLLM / MLLM</b><br>
|
| 357 |
+
The <b>Overall</b> score is computed as the arithmetic mean of all reported task-specific scores.
|
| 358 |
</p>
|
| 359 |
|
| 360 |
<p><b>2. Image Generation</b><br>
|
| 361 |
+
The evaluation involves metrics defined on different numerical scales.
|
| 362 |
+
<b>WIScore</b> is used for image generation, while <b>VIEScore</b> (averaged over three dimensions)
|
| 363 |
+
is used for image editing.
|
| 364 |
</p>
|
| 365 |
|
| 366 |
+
<p style="margin-bottom: 6px;">
|
| 367 |
+
The <b>Overall</b> score is defined as:
|
| 368 |
</p>
|
| 369 |
+
</div>
|
| 370 |
+
"""
|
| 371 |
+
)
|
| 372 |
|
| 373 |
+
gr.Markdown(
|
| 374 |
+
r"""
|
| 375 |
+
\[
|
| 376 |
+
\text{Overall}=\frac{(\text{WIScore}\times 10)+\left(\frac{\sum \text{VIEScore}}{3}\right)}{2}
|
| 377 |
+
\]
|
| 378 |
+
"""
|
| 379 |
+
)
|
| 380 |
|
| 381 |
+
gr.HTML(
|
| 382 |
+
"""
|
| 383 |
+
<div class="overall-definition" style="margin-top: -24px;">
|
| 384 |
<p>
|
| 385 |
+
This normalization-based formulation ensures a balanced contribution from both image generation
|
| 386 |
+
and image editing performance.
|
| 387 |
</p>
|
| 388 |
|
| 389 |
<p><b>3. Video Generation</b><br>
|
| 390 |
+
The <b>Overall</b> score is calculated as the arithmetic mean of all evaluated dimensions,
|
| 391 |
+
including imaging quality, aesthetics, motion, and temporal consistency.
|
| 392 |
</p>
|
|
|
|
| 393 |
</div>
|
| 394 |
+
"""
|
|
|
|
| 395 |
)
|
| 396 |
|
| 397 |
demo.launch()
|