vLAR's picture
leaderboard
b4d728d

A newer version of the Gradio SDK is available: 6.15.2

Upgrade
metadata
title: PhysInOne Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: Duplicate this leaderboard to initialize your own!
sdk_version: 5.43.1
tags:
  - leaderboard

PhysInOne Benchmark Leaderboard

This repository is a Hugging Face based three-part multi-task automatic evaluation system:

  1. Frontend Space (this repository, public): provides the submission form and leaderboard.
  2. Worker Space (the same git repository, but with app_file: worker.py, private): runs asynchronous backend evaluation.
  3. Private Dataset: stores task manifests, ground truth, best historical results, and execution logs.

See DESIGN.md for the full design and TODO list.

Key Design Choices

  • Users do not upload ZIP files directly to the Space. Instead, they submit user_dataset + filename, pointing to their own Hugging Face dataset.
  • The frontend immediately returns an "accepted" response. A separate Worker Space then pulls the data asynchronously, evaluates it, and writes results back to the private dataset.

Required Configuration Before Deployment

Space Secrets
Frontend HF_TOKEN (write access to the private dataset)
Worker HF_TOKEN (write access to the private dataset + read access to user datasets)

Adding a New Task (for collaborators)

  1. Copy src/tasks/_template/ to src/tasks/<your_task>/.
  2. Implement validate(sandbox_dir) and evaluate(sandbox_dir, gt) -> {metric: float}.
  3. Register the task in src/tasks/init.py.
  4. Upload the corresponding GT data into the ground_truth/ directory of the private dataset.

Code Structure