Spaces:

Melady
/

TemporalBench_Leaderboard

Running

App Files Files Community

TemporalBench_Leaderboard / README.md

Ray0202

update

3718ffe 14 days ago

preview code

raw

history blame contribute delete

1.93 kB

	---
	title: TemporalBench Leaderboard
	emoji: 🥇
	colorFrom: green
	colorTo: indigo
	sdk: gradio
	app_file: app.py
	pinned: true
	license: apache-2.0
	short_description: Read-only TemporalBench leaderboard for offline results.
	sdk_version: 5.49.1
	tags:
	- leaderboard
	---

	# TemporalBench Leaderboard

	This Space is a read-only visualization and validation layer for offline TemporalBench results.
	It does not execute agents, call LLM APIs, or accept API keys.

	## Configuration

	- Set the local results file path via `TEMPORALBENCH_RESULTS_PATH`.
	Default is `data/results.json`.
	- Submissions are stored in `data/submissions/` for manual review (override with `TEMPORALBENCH_SUBMISSIONS_PATH`).
	- Update descriptive text in `src/about.py`.

	## Results File Format

	Results must be a JSON list or CSV table, where each record is one agent configuration.
	Required fields per record:

	```json
	{
	"model_name": "string",
	"agent_name": "string",
	"agent_type": "string",
	"base_model": "string",
	"T1_acc": 0.0,
	"T2_acc": 0.0,
	"T3_acc": 0.0,
	"T4_acc": 0.0,
	"T2_sMAPE": 0.0,
	"T2_MAE": 0.0,
	"T4_sMAPE": 0.0,
	"T4_MAE": 0.0,
	"FreshRetailNet_T2_sMAPE": 0.0,
	"FreshRetailNet_T2_MAE": 0.0,
	"MIMIC_T2_OW_sMAPE": 0.0,
	"MIMIC_T2_OW_RMSSE": 0.0
	}
	```

	Notes:
	- `T2_sMAPE`, `T2_MAE`, `T4_sMAPE`, `T4_MAE` are optional (forecasting metrics).
	- Dataset-level columns are optional and displayed if present.
	- For MIMIC forecasting, only `OW_sMAPE` and `OW_RMSSE` are expected.
	- Any additional numeric columns are treated as optional domain metrics and will be shown.
	- Records must have a consistent schema and numeric metric values.

	## Project Structure

	- `app.py`: Gradio UI + leaderboard rendering
	- `src/leaderboard/load_results.py`: Load + validate results
	- `src/leaderboard/schema.py`: Identity/metric field definitions
	- `src/about.py`: Text and descriptions
	- `src/display/css_html_js.py`: Custom styling