--- title: PhysInOne Leaderboard emoji: 🥇 colorFrom: green colorTo: indigo sdk: gradio app_file: app.py pinned: true license: apache-2.0 short_description: Duplicate this leaderboard to initialize your own! sdk_version: 5.43.1 tags: - leaderboard --- # PhysInOne Benchmark Leaderboard This repository is a Hugging Face based **three-part** multi-task automatic evaluation system: 1. **Frontend Space** (this repository, public): provides the submission form and leaderboard. 2. **Worker Space** (the same git repository, but with `app_file: worker.py`, private): runs asynchronous backend evaluation. 3. **Private Dataset**: stores task manifests, ground truth, best historical results, and execution logs. See [DESIGN.md](DESIGN.md) for the full design and TODO list. ## Key Design Choices - Users **do not upload ZIP files directly** to the Space. Instead, they submit `user_dataset + filename`, pointing to their own Hugging Face dataset. - The frontend immediately returns an "accepted" response. A separate **Worker Space** then pulls the data asynchronously, evaluates it, and writes results back to the private dataset. ## Required Configuration Before Deployment | Space | Secrets | |---|---| | Frontend | `HF_TOKEN` (write access to the private dataset) | | Worker | `HF_TOKEN` (write access to the private dataset + read access to user datasets) | ## Adding a New Task (for collaborators) 1. Copy [src/tasks/_template/](src/tasks/_template/__init__.py) to `src/tasks//`. 2. Implement `validate(sandbox_dir)` and `evaluate(sandbox_dir, gt) -> {metric: float}`. 3. Register the task in [src/tasks/__init__.py](src/tasks/__init__.py). 4. Upload the corresponding GT data into the `ground_truth/` directory of the private dataset. ## Code Structure - [app.py](app.py): Frontend Space entry point (submission form + leaderboard + queue). - [worker.py](worker.py): Worker Space entry point (polling + evaluation). - [src/envs.py](src/envs.py): centralized configuration. - [src/storage/hub.py](src/storage/hub.py): private dataset I/O (CAS + listing + logs). - [src/submission/frontend.py](src/submission/frontend.py): writes pending submissions. - [src/worker/](src/worker/loop.py): polling + single-task evaluation. - [src/tasks/](src/tasks/): task plugin layer. - [src/leaderboard/read_evals.py](src/leaderboard/read_evals.py) and [src/populate.py](src/populate.py): leaderboard assembly.