Spaces:
Sleeping
Sleeping
Upload app.py with huggingface_hub
Browse files
app.py
CHANGED
|
@@ -342,28 +342,28 @@ with gr.Blocks(
|
|
| 342 |
)
|
| 343 |
|
| 344 |
gr.Markdown(
|
| 345 |
-
|
| 346 |
-
|
| 347 |
|
| 348 |
-
|
| 349 |
|
| 350 |
-
|
| 351 |
-
|
| 352 |
|
| 353 |
-
|
| 354 |
-
|
| 355 |
-
|
| 356 |
|
| 357 |
-
|
| 358 |
-
|
| 359 |
-
|
| 360 |
|
| 361 |
-
|
| 362 |
|
| 363 |
-
|
| 364 |
-
|
| 365 |
-
|
| 366 |
-
|
| 367 |
|
| 368 |
demo.launch()
|
| 369 |
|
|
|
|
| 342 |
)
|
| 343 |
|
| 344 |
gr.Markdown(
|
| 345 |
+
r"""
|
| 346 |
+
### 📊 Overall Score Definition
|
| 347 |
|
| 348 |
+
To facilitate clearer and more consistent comparison across models, we introduce an **Overall** score for each leaderboard track.
|
| 349 |
|
| 350 |
+
**1. OmniLLM / MLLM**
|
| 351 |
+
The **Overall** score is computed as the arithmetic mean of all reported task-specific scores.
|
| 352 |
|
| 353 |
+
**2. Image Generation**
|
| 354 |
+
The evaluation involves metrics defined on different numerical scales. **WIScore** is used for image generation, while **VIEScore** (averaged over three dimensions) is used for image editing.
|
| 355 |
+
The **Overall** score is defined as:
|
| 356 |
|
| 357 |
+
$$
|
| 358 |
+
\text{Overall}=\frac{(\text{WIScore}\times 10)+\left(\frac{\sum \text{VIEScore}}{3}\right)}{2}
|
| 359 |
+
$$
|
| 360 |
|
| 361 |
+
This normalization-based formulation ensures a balanced contribution from both image generation and image editing performance.
|
| 362 |
|
| 363 |
+
**3. Video Generation**
|
| 364 |
+
The **Overall** score is calculated as the arithmetic mean of all evaluated dimensions, including imaging quality, aesthetics, motion, and temporal consistency.
|
| 365 |
+
"""
|
| 366 |
+
)
|
| 367 |
|
| 368 |
demo.launch()
|
| 369 |
|