import gradio as gr import pandas as pd from src.leaderboard.read_evals import get_leaderboard_df, get_tasks, get_raw_data from src.display.visualization import create_radar_chart, create_group_bar_chart, create_aup_curve_chart from src.display.css_html_js import custom_css, sort_table_js, get_foundation_class CITATION_HTML = """
đ If you find this Leaderboard useful for your research, please star our GitHub repo and cite our work:
@article{preprint'25:d3llm,
author = {Yu-Yang Qian and Junda Su and Lanxiang Hu and Peiyuan Zhang and Zhijie Deng and Peng Zhao and Hao Zhang},
title = {d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation},
journal = {ArXiv preprint},
volume = {to appear},
note = {\\url{https://github.com/hao-ai-lab/d3LLM} [Accessed: 2025-12-11]},
year = {2025}
}
| Rank | Method | Type | Foundation Model | {task_headers}Avg AUP |
|---|
We welcome contributions to the dLLM Leaderboard! To submit your method's results:
Follow the evaluation protocol in the d3LLM repository.
Refer to the eval_scripts folder for benchmark evaluation scripts, and AUP_leaderboard folder for AUP calculation utilities.
Add your results to the appropriate YAML file following this format:
_meta:
YourMethod:
type: dLLM # or AR
foundation: YourFoundation
link: https://link/to/your/method
TaskName:
YourMethod:
- [rho_1, accuracy_1] # (parallelism, accuracy) pairs
- [rho_2, accuracy_2]
Questions? Open an issue on GitHub.
This leaderboard evaluates Diffusion Large Language Models (dLLMs) using the AUP (Accuracy Under Parallelism) metric.
GSM8K-CoT, MATH, HumanEval, MBPP, Long-GSM8K
GitHub Code Repo: https://github.com/hao-ai-lab/d3LLM
Blog: https://hao-ai-lab.github.io/blogs/text-diffusion/