A newer version of the Gradio SDK is available:
6.6.0
metadata
title: TemporalBench Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: Read-only TemporalBench leaderboard for offline results.
sdk_version: 5.49.1
tags:
- leaderboard
TemporalBench Leaderboard
This Space is a read-only visualization and validation layer for offline TemporalBench results.
It does not execute agents, call LLM APIs, or accept API keys.
Configuration
- Set the local results file path via
TEMPORALBENCH_RESULTS_PATH.
Default isdata/results.json. - Submissions are stored in
data/submissions/for manual review (override withTEMPORALBENCH_SUBMISSIONS_PATH). - Update descriptive text in
src/about.py.
Results File Format
Results must be a JSON list or CSV table, where each record is one agent configuration.
Required fields per record:
{
"model_name": "string",
"agent_name": "string",
"agent_type": "string",
"base_model": "string",
"T1_acc": 0.0,
"T2_acc": 0.0,
"T3_acc": 0.0,
"T4_acc": 0.0,
"T2_sMAPE": 0.0,
"T2_MAE": 0.0,
"T4_sMAPE": 0.0,
"T4_MAE": 0.0,
"FreshRetailNet_T2_sMAPE": 0.0,
"FreshRetailNet_T2_MAE": 0.0,
"MIMIC_T2_OW_sMAPE": 0.0,
"MIMIC_T2_OW_RMSSE": 0.0
}
Notes:
T2_sMAPE,T2_MAE,T4_sMAPE,T4_MAEare optional (forecasting metrics).- Dataset-level columns are optional and displayed if present.
- For MIMIC forecasting, only
OW_sMAPEandOW_RMSSEare expected. - Any additional numeric columns are treated as optional domain metrics and will be shown.
- Records must have a consistent schema and numeric metric values.
Project Structure
app.py: Gradio UI + leaderboard renderingsrc/leaderboard/load_results.py: Load + validate resultssrc/leaderboard/schema.py: Identity/metric field definitionssrc/about.py: Text and descriptionssrc/display/css_html_js.py: Custom styling