Spaces:

vLAR
/

PhysInOne-Leaderboard

Running

App Files Files Community

PhysInOne-Leaderboard / README.md

vLAR

leaderboard

b4d728d 5 days ago

preview code

raw

history blame contribute delete

2.42 kB

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

metadata

title: PhysInOne Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: Duplicate this leaderboard to initialize your own!
sdk_version: 5.43.1
tags:
  - leaderboard

PhysInOne Benchmark Leaderboard

This repository is a Hugging Face based three-part multi-task automatic evaluation system:

Frontend Space (this repository, public): provides the submission form and leaderboard.
Worker Space (the same git repository, but with app_file: worker.py, private): runs asynchronous backend evaluation.
Private Dataset: stores task manifests, ground truth, best historical results, and execution logs.

See DESIGN.md for the full design and TODO list.

Key Design Choices

Users do not upload ZIP files directly to the Space. Instead, they submit user_dataset + filename, pointing to their own Hugging Face dataset.
The frontend immediately returns an "accepted" response. A separate Worker Space then pulls the data asynchronously, evaluates it, and writes results back to the private dataset.

Required Configuration Before Deployment

Space	Secrets
Frontend	`HF_TOKEN` (write access to the private dataset)
Worker	`HF_TOKEN` (write access to the private dataset + read access to user datasets)

Adding a New Task (for collaborators)

Copy src/tasks/_template/ to src/tasks/<your_task>/.
Implement validate(sandbox_dir) and evaluate(sandbox_dir, gt) -> {metric: float}.
Register the task in src/tasks/init.py.
Upload the corresponding GT data into the ground_truth/ directory of the private dataset.

Code Structure

app.py: Frontend Space entry point (submission form + leaderboard + queue).
worker.py: Worker Space entry point (polling + evaluation).
src/envs.py: centralized configuration.
src/storage/hub.py: private dataset I/O (CAS + listing + logs).
src/submission/frontend.py: writes pending submissions.
src/worker/: polling + single-task evaluation.
src/tasks/: task plugin layer.
src/leaderboard/read_evals.py and src/populate.py: leaderboard assembly.