Ray0202
update
3718ffe

A newer version of the Gradio SDK is available: 6.6.0

Upgrade
metadata
title: TemporalBench Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: Read-only TemporalBench leaderboard for offline results.
sdk_version: 5.49.1
tags:
  - leaderboard

TemporalBench Leaderboard

This Space is a read-only visualization and validation layer for offline TemporalBench results.
It does not execute agents, call LLM APIs, or accept API keys.

Configuration

  • Set the local results file path via TEMPORALBENCH_RESULTS_PATH.
    Default is data/results.json.
  • Submissions are stored in data/submissions/ for manual review (override with TEMPORALBENCH_SUBMISSIONS_PATH).
  • Update descriptive text in src/about.py.

Results File Format

Results must be a JSON list or CSV table, where each record is one agent configuration.
Required fields per record:

{
  "model_name": "string",
  "agent_name": "string",
  "agent_type": "string",
  "base_model": "string",
  "T1_acc": 0.0,
  "T2_acc": 0.0,
  "T3_acc": 0.0,
  "T4_acc": 0.0,
  "T2_sMAPE": 0.0,
  "T2_MAE": 0.0,
  "T4_sMAPE": 0.0,
  "T4_MAE": 0.0,
  "FreshRetailNet_T2_sMAPE": 0.0,
  "FreshRetailNet_T2_MAE": 0.0,
  "MIMIC_T2_OW_sMAPE": 0.0,
  "MIMIC_T2_OW_RMSSE": 0.0
}

Notes:

  • T2_sMAPE, T2_MAE, T4_sMAPE, T4_MAE are optional (forecasting metrics).
  • Dataset-level columns are optional and displayed if present.
  • For MIMIC forecasting, only OW_sMAPE and OW_RMSSE are expected.
  • Any additional numeric columns are treated as optional domain metrics and will be shown.
  • Records must have a consistent schema and numeric metric values.

Project Structure

  • app.py: Gradio UI + leaderboard rendering
  • src/leaderboard/load_results.py: Load + validate results
  • src/leaderboard/schema.py: Identity/metric field definitions
  • src/about.py: Text and descriptions
  • src/display/css_html_js.py: Custom styling