vLAR's picture
leaderboard
b4d728d
---
title: PhysInOne Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: Duplicate this leaderboard to initialize your own!
sdk_version: 5.43.1
tags:
- leaderboard
---
# PhysInOne Benchmark Leaderboard
This repository is a Hugging Face based **three-part** multi-task automatic evaluation system:
1. **Frontend Space** (this repository, public): provides the submission form and leaderboard.
2. **Worker Space** (the same git repository, but with `app_file: worker.py`, private): runs asynchronous backend evaluation.
3. **Private Dataset**: stores task manifests, ground truth, best historical results, and execution logs.
See [DESIGN.md](DESIGN.md) for the full design and TODO list.
## Key Design Choices
- Users **do not upload ZIP files directly** to the Space. Instead, they submit `user_dataset + filename`, pointing to their own Hugging Face dataset.
- The frontend immediately returns an "accepted" response. A separate **Worker Space** then pulls the data asynchronously, evaluates it, and writes results back to the private dataset.
## Required Configuration Before Deployment
| Space | Secrets |
|---|---|
| Frontend | `HF_TOKEN` (write access to the private dataset) |
| Worker | `HF_TOKEN` (write access to the private dataset + read access to user datasets) |
## Adding a New Task (for collaborators)
1. Copy [src/tasks/_template/](src/tasks/_template/__init__.py) to `src/tasks/<your_task>/`.
2. Implement `validate(sandbox_dir)` and `evaluate(sandbox_dir, gt) -> {metric: float}`.
3. Register the task in [src/tasks/__init__.py](src/tasks/__init__.py).
4. Upload the corresponding GT data into the `ground_truth/` directory of the private dataset.
## Code Structure
- [app.py](app.py): Frontend Space entry point (submission form + leaderboard + queue).
- [worker.py](worker.py): Worker Space entry point (polling + evaluation).
- [src/envs.py](src/envs.py): centralized configuration.
- [src/storage/hub.py](src/storage/hub.py): private dataset I/O (CAS + listing + logs).
- [src/submission/frontend.py](src/submission/frontend.py): writes pending submissions.
- [src/worker/](src/worker/loop.py): polling + single-task evaluation.
- [src/tasks/](src/tasks/): task plugin layer.
- [src/leaderboard/read_evals.py](src/leaderboard/read_evals.py) and [src/populate.py](src/populate.py): leaderboard assembly.