Spaces:
Sleeping
Sleeping
File size: 23,687 Bytes
587f33e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 | <!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description" content="PaperBanana: Automating Academic Illustration for AI Scientists">
<meta name="keywords"
content="PaperBanana, Academic Illustration, Diagram Generation, Statistical Plots, Multi-Agent Framework, Gemini">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>PaperBanana: Automating Academic Illustration for AI Scientists</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/logo.ico">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-widescreen">
<div class="columns is-centered">
<div class="column has-text-centered">
<!-- <h1 class="title is-1 publication-title">
<img src="static/images/logo.png" style="width:1.5em;vertical-align: middle">
<span>DocLens</span>
</h1> -->
<h1 class="subtitle is-3 publication-title">
<img src="static/images/logo.jpg"
style="width:1.5em;vertical-align: middle; border-radius: 50%;">
<b>PaperBanana:</b> Automating Academic Illustration for AI Scientists
</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://scholar.google.com/citations?user=oD2HPaYAAAAJ&hl=en"
target="_blank">Dawei Zhu</a><sup>1,2*</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=s6h8L_UAAAAJ&hl=en&oi=ao"
target="_blank">Rui Meng</a><sup>2</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=dNHNpxoAAAAJ&hl=en&oi=ao"
target="_blank">Yale Song</a><sup>2</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=J_RHFhUAAAAJ&hl=en&oi=ao"
target="_blank">Xiyu Wei</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=RvBDhSwAAAAJ&hl=en&oi=ao"
target="_blank">Sujian Li</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=ahSpJOAAAAAJ&hl=en&oi=ao"
target="_blank">Tomas Pfister</a><sup>2</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=kiFd6A8AAAAJ&hl=en&oi=ao"
target="_blank">Jinsung Yoon</a><sup>2</sup>
</span>
</div>
<br>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>Peking University</span> <br>
<span class="author-block"><sup>2</sup>Google Cloud AI Research</span>
</div>
<br>
<div class="is-size-5 thanks">
<span class="author-block">Corresponding author(s):</span>
<span class="author-block"><a href="mailto:dwzhu@pku.edu.cn">dwzhu@pku.edu.cn</a>,</span>
<span class="author-block"><a
href="mailto:lisujian@pku.edu.cn">lisujian@pku.edu.cn</a>,</span>
<span class="author-block"><a
href="mailto:jinsungyoon@google.com">jinsungyoon@google.com</a></span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<span class="link-block">
<!-- Replace with your actual arXiv link -->
<a href="https://arxiv.org/abs/2601.23265" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span>
<!-- Code Link. -->
<span class="link-block">
<a href="https://github.com/dwzhu-pku/PaperBanana" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code</span>
</a>
</span>
<!-- Twitter Link. -->
<span class="link-block">
<a href="https://x.com/dwzhu128/status/2018405593976103010"
class="external-link button is-normal is-rounded is-dark">
<span class="icon has-text-white">
<i class="fa-brands fa-x-twitter"></i>
</span>
<span>Twitter</span>
</a>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="content has-text-centered">
<img src="static/images/teaser_figure.jpg" alt="PaperBanana Workflow" width="95%" />
<p class="is-size-6">
Examples of methodology diagrams and statistical plots generated by <b>PaperBanana</b>, which show
the potential of automating the generation of academic illustrations.
</p>
</div>
</div>
</section>
<section class="section hero is-light">
<div class="container is-max-desktop">
<!-- Abstract -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<!-- <h2 class="title is-3">Abstract</h2> -->
<div class="content has-text-justified">
<p>
Despite rapid advances in autonomous AI scientists powered by language models, generating
publication-ready illustrations remains a labor-intensive bottleneck in the research
workflow.
To lift this burden, we introduce <b>PaperBanana</b>, an agentic framework for automated
generation of publication-ready academic illustrations.
Powered by state-of-the-art VLMs and image generation models,
PaperBanana orchestrates specialized agents to retrieve references, plan content and style,
render images, and iteratively refine via self-critique.
To rigorously evaluate our framework, we introduce <b>PaperBananaBench</b>, comprising 292
test cases for methodology diagrams curated from NeurIPS 2025 publications, covering diverse
research domains and illustration styles.
Comprehensive experiments demonstrate that PaperBanana consistently outperforms leading
baselines in faithfulness, conciseness, readability, and aesthetics.
We further show that our method effectively extends to the generation of high-quality
statistical plots.
Collectively,
PaperBanana paves the way for the automated generation of publication-ready illustrations.
</p>
</div>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Method Overview</h2>
<div class="content has-text-justified">
<p>
We propose PaperBanana, a reference-driven agentic framework for automated academic
illustration. As illustrated in the diagram (<b>generated by <img
src="static/images/logo.jpg"
style="width:1.0em;vertical-align: middle; border-radius: 50%;"></b>) below,
PaperBanana
orchestrates a collaborative
team of five specialized agents—Retriever, Planner, Stylist, Visualizer, and Critic—to
transform raw scientific content into publication-quality diagrams and plots.
</p>
</div>
<img id="model" width="100%" src="static/images/method_diagram.png">
<br>
<div class="content has-text-justified">
<ul>
<li><strong>Retriever Agent</strong>: Identifies relevant reference examples to guide
downstream agents.</li>
<li><strong>Planner Agent</strong>: Acts as the cognitive core, translating context into
detailed textual descriptions.</li>
<li><strong>Stylist Agent</strong>: Ensures adherance to academic aesthetic standards by
synthesizing guidelines from references.</li>
<li><strong>Visualizer Agent</strong>: Transforms textual descriptions into visual output or
executable code.</li>
<li><strong>Critic Agent</strong>: Inspects generated images/plots against the source to
provide feedback for refinement.</li>
</ul>
</div>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Benchmark Construction</h2>
<div class="content has-text-justified">
<p>
The lack of benchmarks hinders rigorous evaluation of automated diagram generation.
We address this with <b>PaperBananaBench</b>, a dedicated benchmark curated from NeurIPS
2025 methodology diagrams,
capturing the sophisticated aesthetics and diverse logical compositions of modern AI papers.
The construction pipeline ensures high quality through: (1) Collection & Parsing, (2)
Filtering, (3) Categorization, and (4) Human Curation. The final dataset comprises 584
valid samples, partitioned into 292 test and 292 reference cases.
</p>
</div>
<img src="static/images/plot_bench_stat.jpg" width="60%">
<p class="is-size-7 has-text-centered">
[Plot generated by <img src="static/images/logo.jpg"
style="width:1.0em;vertical-align: middle; border-radius: 50%;"> from raw data] Statistics
of the test set
of PaperBananaBench (totaling 292 samples). The average length of source context / figure
caption is 3,020.1 / 70.4 words.
</p>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Experimental Results</h2>
<!-- <h3 class="title is-4">Main Results</h3> -->
<div class="content has-text-justified">
<p>
We evaluate PaperBanana on <b>PaperBananaBench</b>, assessing performance across
faithfulness, conciseness, readability, and aesthetics. Our method consistently
outperforms leading baselines across all four evaluation dimensions.
</p>
</div>
<img src="static/images/main_results.png" alt="Main Results of PaperBanana" width="100%">
<br>
<br>
<br>
<div class="content has-text-justified">
<p>
We further show that our method also seamlessly extends to the generation of high-quality
statistical plots.
The plot below was itself <b>generated by <img src="static/images/logo.jpg"
style="width:1.0em;vertical-align: middle; border-radius: 50%;"></b> from our raw
data.
</p>
</div>
<img src="static/images/plot_vanilla_vs_ours_v2.jpg" alt="Statistical Plots Comparison" width="80%">
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Two Advanced Applications</h2>
<div class="content has-text-justified">
<p>
<b>1. Enhancing Aesthetics of Human-Drawn Diagrams.</b> We explore using our summarized
aesthetic guidelines to elevate the aesthetic quality of
existing human-drawn diagrams. Below is an example:
</p>
</div>
<img src="static/images/discussion_enhance_style.jpg" width="80%">
<br>
<div class="content has-text-justified">
<p>
<b>2. Coding vs Image Generation for Visualizing Statistical Plots.</b> We explore using
image generation models for statistical plots generation, comparing with code-based
approaches. The results below reveal distinct trade-offs: image generation excels in
presentation but underperforms in content fidelity. [The plot below was itself generated by
<img src="static/images/logo.jpg"
style="width:1.0em;vertical-align: middle; border-radius: 50%;">, from our raw data]
</p>
</div>
<img src="static/images/plot_code_vs_img.jpg" alt="Code vs Image Generation" width="80%">
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-full">
<h2 class="title is-3">Case Study</h2>
<div class="content has-text-justified">
</div>
<div id="results-carousel" class="carousel results-carousel" data-slides-to-scroll="1">
<div class="item">
<div class="content has-text-justified is-size-6" style="padding: 1rem;">
<p>
<b>Case Study of Diagram Generation.</b> Given the same source context and caption,
the vanilla Nano-Banana-Pro often produces diagrams with outdated color tones and
overly verbose content. In contrast, our PaperBanana generates results that are more
concise and aesthetically pleasing, while maintaining faithfulness to the source
context.
</p>
</div>
<img src="static/images/case_diagram.png" width="100%">
</div>
<div class="item">
<div class="content has-text-justified is-size-6" style="padding: 1rem;">
<p>
<b>Enhancing Aesthetics.</b> Additional cases for enhancing the aesthetics of
human-drawn diagrams with our auto-summarized style guidelines. The polished
diagrams demonstrate significant stylistic improvements in color schemes,
typography, graphical elements, etc.
</p>
</div>
<img src="static/images/case_style_polish.png" width="100%">
</div>
<div class="item">
<div class="content has-text-justified is-size-6" style="padding: 1rem;">
<p>
<b>Visualizing Statistical Plots.</b> Case study for visualizing statistical plots
with code and image generation. It is observed that the image generation model can
generate more visually appealing plots, but incurs more faithfulness errors such as
numerical hallucination or element repetition.
</p>
</div>
<img src="static/images/case_plot_img_vs_code.png" width="100%">
</div>
<div class="item">
<div class="content has-text-justified is-size-6" style="padding: 1rem;">
<p>
<b>Failure Cases.</b> The primary failure mode involves connection errors, such as
redundant connections and mismatched source-target nodes. Our preliminary analysis
reveals that the critic model often fails to identify these connectivity issues,
suggesting these errors may originate from the foundation model's inherent
perception limitations.
</p>
</div>
<img src="static/images/case_failure.png" width="100%">
</div>
</div>
</div>
</div>
</div>
</section>
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>
</code></pre>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="content has-text-centered">
<!-- Replace with your paper's link -->
<a class="icon-link" target="_blank" href="https://arxiv.org/abs/2601.23265">
<i class="fas fa-file-pdf"></i>
</a>
<!-- Replace with your code's link -->
<a class="icon-link" href="https://github.com/dwzhu-pku/PaperBanana" target="_blank"
class="external-link">
<i class="fab fa-github"></i>
</a>
</div>
<div class="columns is-centered">
<div class="column is-8">
<div class="content">
<p>
This website is licensed under a <a rel="license" target="_blank"
href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
Commons Attribution-ShareAlike 4.0 International License</a>.
</p>
<p>
This means you are free to borrow the <a target="_blank"
href="https://github.com/nerfies/nerfies.github.io">source code</a> of this website,
we just ask that you link back to this page in the footer.
Please remember to remove the analytics code included in the header of the website which
you do not want on your website.
</p>
</div>
</div>
</div>
</div>
</footer>
</body>
</html> |