Spaces:
Runtime error
A newer version of the Gradio SDK is available: 6.12.0
title: BizGenEval Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: mit
short_description: Official BizGenEval leaderboard on Hugging Face.
sdk_version: 5.50.0
tags:
- leaderboard
BizGenEval Leaderboard
This repository hosts the Hugging Face leaderboard for BizGenEval, the benchmark introduced in BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation.
Primary project resources:
- Project page:
https://aka.ms/BizGenEval - GitHub:
https://github.com/microsoft/BizGenEval - Dataset:
https://huggingface.co/datasets/microsoft/BizGenEval
The codebase supports:
- LOCAL_DEV mode (no HF permission required): reads/writes local namespaced paths under
eval-queue/andeval-results/. - HF mode (with permission): syncs datasets from the Hub and uploads queue requests.
1) Local development quick start (no HF permission)
Step 1. Create and activate virtualenv
cd /Users/clarencestark/code/BizGenEval-Leaderboard
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Step 2. Bootstrap local demo data
python3 scripts/bootstrap_local_dev.py
This will create:
eval-queue/bizgeneval/requests/microsoft/Phi-4o-mini_eval_request_False_float16_Original.jsoneval-results/bizgeneval/results/microsoft/Phi-4o-mini/summary.json
Step 3. Launch in local mode
export LOCAL_DEV=1
python3 app.py
In LOCAL_DEV mode:
snapshot_downloadis skipped.- Model-card/tokenizer checks are skipped during submission.
- New submissions are written to local
eval-queue/bizgeneval/requests/only (no upload).
2) Result file format supported
The leaderboard parser currently supports two formats:
A) BizGenEval summary format (recommended)
Put a summary.json under:
eval-results/bizgeneval/results/<org>/<model>/summary.json
Example:
{
"model_name": "microsoft/Phi-4o-mini",
"model_sha": "main",
"by_domain": {
"slides": {"error_score": 0.8125},
"webpage": {"error_score": 0.845},
"poster": {"error_score": 0.7875},
"chart": {"error_score": 0.8025},
"scientific_figure": {"error_score": 0.77}
},
"by_dimension": {
"layout": {"error_score": 0.835},
"attribute": {"error_score": 0.805},
"text": {"error_score": 0.79},
"knowledge": {"error_score": 0.775}
}
}
error_score can be either 0~1 or 0~100; both are accepted and normalized to a displayed 0~100 scale.
B) Legacy template format
Legacy config/results JSON is still accepted for compatibility.
3) Queue file format
Queue entries are JSON files in:
eval-queue/bizgeneval/requests/<org>/*.json
A typical file contains:
modelrevisionprecisionweight_typestatus(PENDING,RUNNING,FINISHED*)- metadata (
license,params,likes, ...)
4) Config knobs
Main config file: src/envs.py
LOCAL_DEV(env):1/true/onto enable local modeHF_OWNER(env, optional): owner fallbackPROJECT_NAMESPACE(env, optional): defaults tobizgenevalHF_SPACE_REPO(env, optional)HF_QUEUE_REPO(env, optional)HF_RESULTS_REPO(env, optional)HF_TOKEN(env): required only for Hub sync/upload
Default repo names are:
- Space:
microsoft/BizGenEval-Leaderboard - Queue dataset:
demo-leaderboard-backend/requests - Results dataset:
demo-leaderboard-backend/results
5) Key code locations
- Columns and UI display fields:
src/display/utils.py - Result parser:
src/leaderboard/read_evals.py - DataFrame build logic:
src/populate.py - Submission validation/upload behavior:
src/submission/submit.py - Task definitions and page text:
src/about.py