### How to Submit
1. Fork
this repository.
2. Create a new branch for your submission.
3. Add your submission folder under
`submissions/
____/`.
4. Open a Pull Request with the new submission folder.
### Submission Directory Requirements
Each submission directory must contain the metadata and predictions for one
model/input configuration pair:
```text
____/
metadata.yaml
predictions.jsonl
generation_config.json # optional, recommended
artifacts/ # optional logs or prompt notes
```
Use URL-safe directory names. Replace spaces, slashes, and special characters
with hyphens; keep `input_config` as `TEXT` or `VISUAL`.
### `metadata.yaml`
```yaml
model_name: "My Model"
organization: "My Org"
model_url: https://... # optional work link: paper, GitHub, model card, etc.
date: "2026-06-17" # model release date, not submission date
split: test
input_config: TEXT # TEXT or VISUAL
```
### `predictions.jsonl`
Each line must be one JSON object:
```json
{
"id": "paper-id",
"part_idx": 1,
"question": "question text",
"category": "category",
"gen_answer": "model answer"
}
```
`part_idx` is the question index in the current paper's `qa_pairs` list (`1` for the first item). `category` must match the corresponding item in `test.json`.
### Validation Rules
Your submission will be validated before evaluation. To pass:
- `metadata.yaml` must include `model_name`, `organization`, `date`, `split`,
and `input_config`.
- `model_url` is optional.
- `date` is the model release date, not the submission date.
- `split` must be `test`.
- `input_config` must be `TEXT` or `VISUAL`.
- `predictions.jsonl` must contain exactly one line for every QA item in
`test.json`.
- `part_idx` is the question index in the current paper's `qa_pairs` list
(`1` for the first item).
- `id`, `part_idx`, `question`, and `category` must exactly match the benchmark
item.
- `gen_answer` must be a string.
- For `Claim_Verification`, `gen_answer` must be exactly `True` or `False`.
### Submission Process
1. Open PR: add your folder under
`submissions/____/`.
2. Fix issues: if validation fails, update the PR with corrected files.
3. Review: once validation passes, a maintainer reviews the submission.
4. Evaluate: maintainers run the official evaluator in a controlled local
environment.
5. Import: accepted aggregate results are imported to the leaderboard.
"""
def _parse_markdown_link(value):
text = str(value).strip()
match = MODEL_LINK_RE.match(text)
if match:
return match.group("name"), match.group("url")
return text, ""
def _read_csv_leaderboard():
df = pd.read_csv(LEADERBOARD_CSV_PATH)
if "Info" in df.columns and "Informativeness" not in df.columns:
df = df.rename(columns={"Info": "Informativeness"})
names = []
urls = []
for value in df.get("Model", []):
name, url = _parse_markdown_link(value)
names.append(name)
urls.append(url)
if "Model" in df.columns:
df["Model"] = names
df["url"] = urls
for col in NUMERIC_COLUMNS:
if col in df.columns:
df[col] = pd.to_numeric(df[col], errors="coerce")
df = df.sort_values("Informativeness", ascending=False, na_position="last").reset_index(drop=True)
df.insert(0, "Rank", range(1, len(df) + 1))
return df
def _read_json_leaderboard():
with LEADERBOARD_JSON_PATH.open("r", encoding="utf-8") as f:
data = json.load(f)
rows = []
for season in data.get("seasons", {}).values():
for row in season.get("models", []):
rows.append({
"Model": row.get("name", ""),
"url": row.get("url", ""),
"Organization": row.get("org", ""),
"Input Config": str(row.get("modality", "")).upper(),
"Conciseness": row.get("conciseness", 0),
"Correctness": row.get("correctness", 0),
"Completeness": row.get("completeness", 0),
"F1-like": row.get("f1_like", row.get("informativeness", 0)),
"Informativeness": row.get("informativeness", row.get("info", row.get("overall", 0))),
"Date": row.get("date", ""),
})
df = pd.DataFrame(rows)
if df.empty:
return pd.DataFrame(columns=DISPLAY_COLUMNS + ["url"])
df = df.sort_values("Informativeness", ascending=False, na_position="last").reset_index(drop=True)
df.insert(0, "Rank", range(1, len(df) + 1))
return df
def load_leaderboard_table():
if LEADERBOARD_CSV_PATH.exists():
try:
return _read_csv_leaderboard()
except Exception:
pass
return _read_json_leaderboard()
def _format_cell(value, column):
if pd.isna(value):
return ""
if column in NUMERIC_COLUMNS:
return f"{float(value):.2f}"
return html.escape(str(value))
def _render_input_config(value):
config = str(value).upper()
if config == "TEXT":
return '