Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.15.2
metadata
title: PhysInOne Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: Duplicate this leaderboard to initialize your own!
sdk_version: 5.43.1
tags:
- leaderboard
PhysInOne Benchmark Leaderboard
This repository is a Hugging Face based three-part multi-task automatic evaluation system:
- Frontend Space (this repository, public): provides the submission form and leaderboard.
- Worker Space (the same git repository, but with
app_file: worker.py, private): runs asynchronous backend evaluation. - Private Dataset: stores task manifests, ground truth, best historical results, and execution logs.
See DESIGN.md for the full design and TODO list.
Key Design Choices
- Users do not upload ZIP files directly to the Space. Instead, they submit
user_dataset + filename, pointing to their own Hugging Face dataset. - The frontend immediately returns an "accepted" response. A separate Worker Space then pulls the data asynchronously, evaluates it, and writes results back to the private dataset.
Required Configuration Before Deployment
| Space | Secrets |
|---|---|
| Frontend | HF_TOKEN (write access to the private dataset) |
| Worker | HF_TOKEN (write access to the private dataset + read access to user datasets) |
Adding a New Task (for collaborators)
- Copy src/tasks/_template/ to
src/tasks/<your_task>/. - Implement
validate(sandbox_dir)andevaluate(sandbox_dir, gt) -> {metric: float}. - Register the task in src/tasks/init.py.
- Upload the corresponding GT data into the
ground_truth/directory of the private dataset.
Code Structure
- app.py: Frontend Space entry point (submission form + leaderboard + queue).
- worker.py: Worker Space entry point (polling + evaluation).
- src/envs.py: centralized configuration.
- src/storage/hub.py: private dataset I/O (CAS + listing + logs).
- src/submission/frontend.py: writes pending submissions.
- src/worker/: polling + single-task evaluation.
- src/tasks/: task plugin layer.
- src/leaderboard/read_evals.py and src/populate.py: leaderboard assembly.