|
|
--- |
|
|
title: TemporalBench Leaderboard |
|
|
emoji: 🥇 |
|
|
colorFrom: green |
|
|
colorTo: indigo |
|
|
sdk: gradio |
|
|
app_file: app.py |
|
|
pinned: true |
|
|
license: apache-2.0 |
|
|
short_description: Read-only TemporalBench leaderboard for offline results. |
|
|
sdk_version: 5.49.1 |
|
|
tags: |
|
|
- leaderboard |
|
|
--- |
|
|
|
|
|
# TemporalBench Leaderboard |
|
|
|
|
|
This Space is a read-only visualization and validation layer for **offline** TemporalBench results. |
|
|
It does not execute agents, call LLM APIs, or accept API keys. |
|
|
|
|
|
## Configuration |
|
|
|
|
|
- Set the local results file path via `TEMPORALBENCH_RESULTS_PATH`. |
|
|
Default is `data/results.json`. |
|
|
- Submissions are stored in `data/submissions/` for manual review (override with `TEMPORALBENCH_SUBMISSIONS_PATH`). |
|
|
- Update descriptive text in `src/about.py`. |
|
|
|
|
|
## Results File Format |
|
|
|
|
|
Results must be a JSON list or CSV table, where each record is one agent configuration. |
|
|
Required fields per record: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"model_name": "string", |
|
|
"agent_name": "string", |
|
|
"agent_type": "string", |
|
|
"base_model": "string", |
|
|
"T1_acc": 0.0, |
|
|
"T2_acc": 0.0, |
|
|
"T3_acc": 0.0, |
|
|
"T4_acc": 0.0, |
|
|
"T2_sMAPE": 0.0, |
|
|
"T2_MAE": 0.0, |
|
|
"T4_sMAPE": 0.0, |
|
|
"T4_MAE": 0.0, |
|
|
"FreshRetailNet_T2_sMAPE": 0.0, |
|
|
"FreshRetailNet_T2_MAE": 0.0, |
|
|
"MIMIC_T2_OW_sMAPE": 0.0, |
|
|
"MIMIC_T2_OW_RMSSE": 0.0 |
|
|
} |
|
|
``` |
|
|
|
|
|
Notes: |
|
|
- `T2_sMAPE`, `T2_MAE`, `T4_sMAPE`, `T4_MAE` are optional (forecasting metrics). |
|
|
- Dataset-level columns are optional and displayed if present. |
|
|
- For MIMIC forecasting, only `OW_sMAPE` and `OW_RMSSE` are expected. |
|
|
- Any additional numeric columns are treated as optional domain metrics and will be shown. |
|
|
- Records must have a consistent schema and numeric metric values. |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
- `app.py`: Gradio UI + leaderboard rendering |
|
|
- `src/leaderboard/load_results.py`: Load + validate results |
|
|
- `src/leaderboard/schema.py`: Identity/metric field definitions |
|
|
- `src/about.py`: Text and descriptions |
|
|
- `src/display/css_html_js.py`: Custom styling |
|
|
|