Spaces:

vLAR
/

PhysInOne-Leaderboard

Running

App Files Files Community

PhysInOne-Leaderboard / README.md

vLAR

leaderboard

b4d728d 5 days ago

preview code

raw

history blame contribute delete

2.42 kB

	---
	title: PhysInOne Leaderboard
	emoji: 🥇
	colorFrom: green
	colorTo: indigo
	sdk: gradio
	app_file: app.py
	pinned: true
	license: apache-2.0
	short_description: Duplicate this leaderboard to initialize your own!
	sdk_version: 5.43.1
	tags:
	- leaderboard
	---

	# PhysInOne Benchmark Leaderboard

	This repository is a Hugging Face based three-part multi-task automatic evaluation system:

	1. Frontend Space (this repository, public): provides the submission form and leaderboard.
	2. Worker Space (the same git repository, but with `app_file: worker.py`, private): runs asynchronous backend evaluation.
	3. Private Dataset: stores task manifests, ground truth, best historical results, and execution logs.

	See [DESIGN.md](DESIGN.md) for the full design and TODO list.

	## Key Design Choices

	- Users do not upload ZIP files directly to the Space. Instead, they submit `user_dataset + filename`, pointing to their own Hugging Face dataset.
	- The frontend immediately returns an "accepted" response. A separate Worker Space then pulls the data asynchronously, evaluates it, and writes results back to the private dataset.

	## Required Configuration Before Deployment

	\| Space \| Secrets \|
	\|---\|---\|
	\| Frontend \| `HF_TOKEN` (write access to the private dataset) \|
	\| Worker \| `HF_TOKEN` (write access to the private dataset + read access to user datasets) \|

	## Adding a New Task (for collaborators)

	1. Copy [src/tasks/_template/](src/tasks/_template/__init__.py) to `src/tasks/<your_task>/`.
	2. Implement `validate(sandbox_dir)` and `evaluate(sandbox_dir, gt) -> {metric: float}`.
	3. Register the task in [src/tasks/__init__.py](src/tasks/__init__.py).
	4. Upload the corresponding GT data into the `ground_truth/` directory of the private dataset.

	## Code Structure

	- [app.py](app.py): Frontend Space entry point (submission form + leaderboard + queue).
	- [worker.py](worker.py): Worker Space entry point (polling + evaluation).
	- [src/envs.py](src/envs.py): centralized configuration.
	- [src/storage/hub.py](src/storage/hub.py): private dataset I/O (CAS + listing + logs).
	- [src/submission/frontend.py](src/submission/frontend.py): writes pending submissions.
	- [src/worker/](src/worker/loop.py): polling + single-task evaluation.
	- [src/tasks/](src/tasks/): task plugin layer.
	- [src/leaderboard/read_evals.py](src/leaderboard/read_evals.py) and [src/populate.py](src/populate.py): leaderboard assembly.