Spaces:
Running
Running
| title: PhysInOne Leaderboard | |
| emoji: 🥇 | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| short_description: Duplicate this leaderboard to initialize your own! | |
| sdk_version: 5.43.1 | |
| tags: | |
| - leaderboard | |
| # PhysInOne Benchmark Leaderboard | |
| This repository is a Hugging Face based **three-part** multi-task automatic evaluation system: | |
| 1. **Frontend Space** (this repository, public): provides the submission form and leaderboard. | |
| 2. **Worker Space** (the same git repository, but with `app_file: worker.py`, private): runs asynchronous backend evaluation. | |
| 3. **Private Dataset**: stores task manifests, ground truth, best historical results, and execution logs. | |
| See [DESIGN.md](DESIGN.md) for the full design and TODO list. | |
| ## Key Design Choices | |
| - Users **do not upload ZIP files directly** to the Space. Instead, they submit `user_dataset + filename`, pointing to their own Hugging Face dataset. | |
| - The frontend immediately returns an "accepted" response. A separate **Worker Space** then pulls the data asynchronously, evaluates it, and writes results back to the private dataset. | |
| ## Required Configuration Before Deployment | |
| | Space | Secrets | | |
| |---|---| | |
| | Frontend | `HF_TOKEN` (write access to the private dataset) | | |
| | Worker | `HF_TOKEN` (write access to the private dataset + read access to user datasets) | | |
| ## Adding a New Task (for collaborators) | |
| 1. Copy [src/tasks/_template/](src/tasks/_template/__init__.py) to `src/tasks/<your_task>/`. | |
| 2. Implement `validate(sandbox_dir)` and `evaluate(sandbox_dir, gt) -> {metric: float}`. | |
| 3. Register the task in [src/tasks/__init__.py](src/tasks/__init__.py). | |
| 4. Upload the corresponding GT data into the `ground_truth/` directory of the private dataset. | |
| ## Code Structure | |
| - [app.py](app.py): Frontend Space entry point (submission form + leaderboard + queue). | |
| - [worker.py](worker.py): Worker Space entry point (polling + evaluation). | |
| - [src/envs.py](src/envs.py): centralized configuration. | |
| - [src/storage/hub.py](src/storage/hub.py): private dataset I/O (CAS + listing + logs). | |
| - [src/submission/frontend.py](src/submission/frontend.py): writes pending submissions. | |
| - [src/worker/](src/worker/loop.py): polling + single-task evaluation. | |
| - [src/tasks/](src/tasks/): task plugin layer. | |
| - [src/leaderboard/read_evals.py](src/leaderboard/read_evals.py) and [src/populate.py](src/populate.py): leaderboard assembly. |