Spaces:
Sleeping
Sleeping
zhenwu0831 commited on
Commit Β·
6eb9e63
1
Parent(s): 3bc94bc
v26
Browse files
app.py
CHANGED
|
@@ -575,7 +575,7 @@ with gr.Blocks(title="Leaderboard QA Judge", theme=gr.themes.Soft()) as app:
|
|
| 575 |
# π Assignment 2 Public Leaderboard
|
| 576 |
|
| 577 |
We compute multiple metrics:
|
| 578 |
-
- **Standard metrics:** Answer Recall, F1
|
| 579 |
- **LLM-as-judge:** rubric-based score (1β5)
|
| 580 |
|
| 581 |
**Total score** is the uniform mean of the available normalized metrics (0β1).
|
|
@@ -592,16 +592,18 @@ We compute multiple metrics:
|
|
| 592 |
```
|
| 593 |
|
| 594 |
**Important:** Your submission must include answers for ALL questions in the dataset. The number of answers must exactly match the number of questions in the gold dataset.
|
|
|
|
|
|
|
| 595 |
"""
|
| 596 |
)
|
| 597 |
|
| 598 |
with gr.Tabs():
|
| 599 |
with gr.Tab("π€ Submit"):
|
| 600 |
-
file_input = gr.File(label="Upload submission
|
| 601 |
submit_btn = gr.Button("π Submit & Evaluate", variant="primary")
|
| 602 |
status = gr.Textbox(label="Result", lines=10, interactive=False)
|
| 603 |
|
| 604 |
-
gr.Markdown("### Sample submission
|
| 605 |
sample = gr.Textbox(value=sample_submission_text(), lines=6)
|
| 606 |
|
| 607 |
with gr.Tab("π
Leaderboard"):
|
|
|
|
| 575 |
# π Assignment 2 Public Leaderboard
|
| 576 |
|
| 577 |
We compute multiple metrics:
|
| 578 |
+
- **Standard metrics:** Answer Recall, F1, and ROUGE-1/2/L (reported as an average)
|
| 579 |
- **LLM-as-judge:** rubric-based score (1β5)
|
| 580 |
|
| 581 |
**Total score** is the uniform mean of the available normalized metrics (0β1).
|
|
|
|
| 592 |
```
|
| 593 |
|
| 594 |
**Important:** Your submission must include answers for ALL questions in the dataset. The number of answers must exactly match the number of questions in the gold dataset.
|
| 595 |
+
|
| 596 |
+
**Please don't refresh or redirect the page during evaluation. It may take sometime to finish.**
|
| 597 |
"""
|
| 598 |
)
|
| 599 |
|
| 600 |
with gr.Tabs():
|
| 601 |
with gr.Tab("π€ Submit"):
|
| 602 |
+
file_input = gr.File(label="Upload submission in json", file_types=[".json"])
|
| 603 |
submit_btn = gr.Button("π Submit & Evaluate", variant="primary")
|
| 604 |
status = gr.Textbox(label="Result", lines=10, interactive=False)
|
| 605 |
|
| 606 |
+
gr.Markdown("### Sample submission")
|
| 607 |
sample = gr.Textbox(value=sample_submission_text(), lines=6)
|
| 608 |
|
| 609 |
with gr.Tab("π
Leaderboard"):
|