File size: 23,687 Bytes
587f33e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <meta name="description" content="PaperBanana: Automating Academic Illustration for AI Scientists">
    <meta name="keywords"
        content="PaperBanana, Academic Illustration, Diagram Generation, Statistical Plots, Multi-Agent Framework, Gemini">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>PaperBanana: Automating Academic Illustration for AI Scientists</title>

    <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">

    <link rel="stylesheet" href="./static/css/bulma.min.css">
    <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
    <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css">
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
    <link rel="stylesheet" href="./static/css/index.css">

    <link rel="icon" href="./static/images/logo.ico">

    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
    <script src="./static/js/bulma-carousel.min.js"></script>
    <script src="./static/js/bulma-slider.min.js"></script>
</head>

<body>

    <section class="hero">
        <div class="hero-body">
            <div class="container is-max-widescreen">
                <div class="columns is-centered">
                    <div class="column has-text-centered">
                        <!-- <h1 class="title is-1 publication-title">
                            <img src="static/images/logo.png" style="width:1.5em;vertical-align: middle">
                            <span>DocLens</span>
                        </h1> -->
                        <h1 class="subtitle is-3 publication-title">
                            <img src="static/images/logo.jpg"
                                style="width:1.5em;vertical-align: middle; border-radius: 50%;">
                            <b>PaperBanana:</b> Automating Academic Illustration for AI Scientists
                        </h1>
                        <div class="is-size-5 publication-authors">
                            <span class="author-block">
                                <a href="https://scholar.google.com/citations?user=oD2HPaYAAAAJ&hl=en"
                                    target="_blank">Dawei Zhu</a><sup>1,2*</sup>,
                            </span>
                            <span class="author-block">
                                <a href="https://scholar.google.com/citations?user=s6h8L_UAAAAJ&hl=en&oi=ao"
                                    target="_blank">Rui Meng</a><sup>2</sup>,
                            </span>
                            <span class="author-block">
                                <a href="https://scholar.google.com/citations?user=dNHNpxoAAAAJ&hl=en&oi=ao"
                                    target="_blank">Yale Song</a><sup>2</sup>,
                            </span>
                            <span class="author-block">
                                <a href="https://scholar.google.com/citations?user=J_RHFhUAAAAJ&hl=en&oi=ao"
                                    target="_blank">Xiyu Wei</a><sup>1</sup>,
                            </span>
                            <span class="author-block">
                                <a href="https://scholar.google.com/citations?user=RvBDhSwAAAAJ&hl=en&oi=ao"
                                    target="_blank">Sujian Li</a><sup>1</sup>,
                            </span>
                            <span class="author-block">
                                <a href="https://scholar.google.com/citations?user=ahSpJOAAAAAJ&hl=en&oi=ao"
                                    target="_blank">Tomas Pfister</a><sup>2</sup>,
                            </span>
                            <span class="author-block">
                                <a href="https://scholar.google.com/citations?user=kiFd6A8AAAAJ&hl=en&oi=ao"
                                    target="_blank">Jinsung Yoon</a><sup>2</sup>
                            </span>
                        </div>

                        <br>

                        <div class="is-size-5 publication-authors">
                            <span class="author-block"><sup>1</sup>Peking University</span>&nbsp;&nbsp;&nbsp; <br>
                            <span class="author-block"><sup>2</sup>Google Cloud AI Research</span>
                        </div>

                        <br>

                        <div class="is-size-5 thanks">
                            <span class="author-block">Corresponding author(s):</span>
                            <span class="author-block"><a href="mailto:dwzhu@pku.edu.cn">dwzhu@pku.edu.cn</a>,</span>
                            <span class="author-block"><a
                                    href="mailto:lisujian@pku.edu.cn">lisujian@pku.edu.cn</a>,</span>
                            <span class="author-block"><a
                                    href="mailto:jinsungyoon@google.com">jinsungyoon@google.com</a></span>
                        </div>

                        <div class="column has-text-centered">
                            <div class="publication-links">
                                <span class="link-block">
                                    <!-- Replace with your actual arXiv link -->
                                    <a href="https://arxiv.org/abs/2601.23265" target="_blank"
                                        class="external-link button is-normal is-rounded is-dark">
                                        <span class="icon">
                                            <i class="ai ai-arxiv"></i>
                                        </span>
                                        <span>arXiv</span>
                                    </a>
                                </span>
                                <!-- Code Link. -->
                                <span class="link-block">
                                    <a href="https://github.com/dwzhu-pku/PaperBanana" target="_blank"
                                        class="external-link button is-normal is-rounded is-dark">
                                        <span class="icon">
                                            <i class="fab fa-github"></i>
                                        </span>
                                        <span>Code</span>
                                    </a>
                                </span>
                                <!-- Twitter Link. -->
                                <span class="link-block">
                                    <a href="https://x.com/dwzhu128/status/2018405593976103010"
                                        class="external-link button is-normal is-rounded is-dark">
                                        <span class="icon has-text-white">
                                            <i class="fa-brands fa-x-twitter"></i>
                                        </span>
                                        <span>Twitter</span>
                                    </a>
                                </span>
                            </div>

                        </div>
                    </div>
                </div>
            </div>
        </div>
    </section>

    <section class="hero teaser">
        <div class="container is-max-desktop">
            <div class="content has-text-centered">
                <img src="static/images/teaser_figure.jpg" alt="PaperBanana Workflow" width="95%" />
                <p class="is-size-6">
                    Examples of methodology diagrams and statistical plots generated by <b>PaperBanana</b>, which show
                    the potential of automating the generation of academic illustrations.
                </p>
            </div>
        </div>
    </section>


    <section class="section hero is-light">
        <div class="container is-max-desktop">
            <!-- Abstract -->
            <div class="columns is-centered has-text-centered">
                <div class="column is-four-fifths">
                    <!-- <h2 class="title is-3">Abstract</h2> -->
                    <div class="content has-text-justified">
                        <p>
                            Despite rapid advances in autonomous AI scientists powered by language models, generating
                            publication-ready illustrations remains a labor-intensive bottleneck in the research
                            workflow.
                            To lift this burden, we introduce <b>PaperBanana</b>, an agentic framework for automated
                            generation of publication-ready academic illustrations.
                            Powered by state-of-the-art VLMs and image generation models,
                            PaperBanana orchestrates specialized agents to retrieve references, plan content and style,
                            render images, and iteratively refine via self-critique.
                            To rigorously evaluate our framework, we introduce <b>PaperBananaBench</b>, comprising 292
                            test cases for methodology diagrams curated from NeurIPS 2025 publications, covering diverse
                            research domains and illustration styles.
                            Comprehensive experiments demonstrate that PaperBanana consistently outperforms leading
                            baselines in faithfulness, conciseness, readability, and aesthetics.
                            We further show that our method effectively extends to the generation of high-quality
                            statistical plots.
                            Collectively,
                            PaperBanana paves the way for the automated generation of publication-ready illustrations.
                        </p>
                    </div>
                </div>
            </div>

        </div>
    </section>

    <section class="section">
        <div class="container is-max-desktop">
            <div class="columns is-centered has-text-centered">
                <div class="column is-four-fifths">
                    <h2 class="title is-3">Method Overview</h2>
                    <div class="content has-text-justified">
                        <p>
                            We propose PaperBanana, a reference-driven agentic framework for automated academic
                            illustration. As illustrated in the diagram (<b>generated by <img
                                    src="static/images/logo.jpg"
                                    style="width:1.0em;vertical-align: middle; border-radius: 50%;"></b>) below,
                            PaperBanana
                            orchestrates a collaborative
                            team of five specialized agents—Retriever, Planner, Stylist, Visualizer, and Critic—to
                            transform raw scientific content into publication-quality diagrams and plots.
                        </p>
                    </div>
                    <img id="model" width="100%" src="static/images/method_diagram.png">

                    <br>

                    <div class="content has-text-justified">
                        <ul>
                            <li><strong>Retriever Agent</strong>: Identifies relevant reference examples to guide
                                downstream agents.</li>
                            <li><strong>Planner Agent</strong>: Acts as the cognitive core, translating context into
                                detailed textual descriptions.</li>
                            <li><strong>Stylist Agent</strong>: Ensures adherance to academic aesthetic standards by
                                synthesizing guidelines from references.</li>
                            <li><strong>Visualizer Agent</strong>: Transforms textual descriptions into visual output or
                                executable code.</li>
                            <li><strong>Critic Agent</strong>: Inspects generated images/plots against the source to
                                provide feedback for refinement.</li>
                        </ul>
                    </div>
                </div>
            </div>
        </div>
    </section>

    <section class="section">
        <div class="container is-max-desktop">
            <div class="columns is-centered has-text-centered">
                <div class="column is-four-fifths">
                    <h2 class="title is-3">Benchmark Construction</h2>
                    <div class="content has-text-justified">
                        <p>
                            The lack of benchmarks hinders rigorous evaluation of automated diagram generation.
                            We address this with <b>PaperBananaBench</b>, a dedicated benchmark curated from NeurIPS
                            2025 methodology diagrams,
                            capturing the sophisticated aesthetics and diverse logical compositions of modern AI papers.
                            The construction pipeline ensures high quality through: (1) Collection & Parsing, (2)
                            Filtering, (3) Categorization, and (4) Human Curation. The final dataset comprises 584
                            valid samples, partitioned into 292 test and 292 reference cases.
                        </p>
                    </div>
                    <img src="static/images/plot_bench_stat.jpg" width="60%">
                    <p class="is-size-7 has-text-centered">
                        [Plot generated by <img src="static/images/logo.jpg"
                            style="width:1.0em;vertical-align: middle; border-radius: 50%;"> from raw data] Statistics
                        of the test set
                        of PaperBananaBench (totaling 292 samples). The average length of source context / figure
                        caption is 3,020.1 / 70.4 words.
                    </p>
                </div>
            </div>
        </div>
    </section>




    <section class="section">
        <div class="container is-max-desktop">
            <div class="columns is-centered has-text-centered">
                <div class="column is-four-fifths">
                    <h2 class="title is-3">Experimental Results</h2>


                    <!-- <h3 class="title is-4">Main Results</h3> -->
                    <div class="content has-text-justified">
                        <p>
                            We evaluate PaperBanana on <b>PaperBananaBench</b>, assessing performance across
                            faithfulness, conciseness, readability, and aesthetics. Our method consistently
                            outperforms leading baselines across all four evaluation dimensions.
                        </p>
                    </div>
                    <img src="static/images/main_results.png" alt="Main Results of PaperBanana" width="100%">
                    <br>
                    <br>
                    <br>
                    <div class="content has-text-justified">
                        <p>
                            We further show that our method also seamlessly extends to the generation of high-quality
                            statistical plots.
                            The plot below was itself <b>generated by <img src="static/images/logo.jpg"
                                    style="width:1.0em;vertical-align: middle; border-radius: 50%;"></b> from our raw
                            data.
                        </p>
                    </div>
                    <img src="static/images/plot_vanilla_vs_ours_v2.jpg" alt="Statistical Plots Comparison" width="80%">
                </div>
            </div>
        </div>
    </section>

    <section class="section">
        <div class="container is-max-desktop">
            <div class="columns is-centered has-text-centered">
                <div class="column is-four-fifths">
                    <h2 class="title is-3">Two Advanced Applications</h2>


                    <div class="content has-text-justified">
                        <p>
                            <b>1. Enhancing Aesthetics of Human-Drawn Diagrams.</b> We explore using our summarized
                            aesthetic guidelines to elevate the aesthetic quality of
                            existing human-drawn diagrams. Below is an example:
                        </p>
                    </div>
                    <img src="static/images/discussion_enhance_style.jpg" width="80%">
                    <br>

                    <div class="content has-text-justified">
                        <p>
                            <b>2. Coding vs Image Generation for Visualizing Statistical Plots.</b> We explore using
                            image generation models for statistical plots generation, comparing with code-based
                            approaches. The results below reveal distinct trade-offs: image generation excels in
                            presentation but underperforms in content fidelity. [The plot below was itself generated by
                            <img src="static/images/logo.jpg"
                                style="width:1.0em;vertical-align: middle; border-radius: 50%;">, from our raw data]
                        </p>
                    </div>
                    <img src="static/images/plot_code_vs_img.jpg" alt="Code vs Image Generation" width="80%">
                </div>
            </div>
        </div>
    </section>


    <section class="section">
        <div class="container is-max-desktop">
            <div class="columns is-centered has-text-centered">
                <div class="column is-full">
                    <h2 class="title is-3">Case Study</h2>
                    <div class="content has-text-justified">
                    </div>
                    <div id="results-carousel" class="carousel results-carousel" data-slides-to-scroll="1">
                        <div class="item">
                            <div class="content has-text-justified is-size-6" style="padding: 1rem;">
                                <p>
                                    <b>Case Study of Diagram Generation.</b> Given the same source context and caption,
                                    the vanilla Nano-Banana-Pro often produces diagrams with outdated color tones and
                                    overly verbose content. In contrast, our PaperBanana generates results that are more
                                    concise and aesthetically pleasing, while maintaining faithfulness to the source
                                    context.
                                </p>
                            </div>
                            <img src="static/images/case_diagram.png" width="100%">
                        </div>
                        <div class="item">
                            <div class="content has-text-justified is-size-6" style="padding: 1rem;">
                                <p>
                                    <b>Enhancing Aesthetics.</b> Additional cases for enhancing the aesthetics of
                                    human-drawn diagrams with our auto-summarized style guidelines. The polished
                                    diagrams demonstrate significant stylistic improvements in color schemes,
                                    typography, graphical elements, etc.
                                </p>
                            </div>
                            <img src="static/images/case_style_polish.png" width="100%">
                        </div>
                        <div class="item">
                            <div class="content has-text-justified is-size-6" style="padding: 1rem;">
                                <p>
                                    <b>Visualizing Statistical Plots.</b> Case study for visualizing statistical plots
                                    with code and image generation. It is observed that the image generation model can
                                    generate more visually appealing plots, but incurs more faithfulness errors such as
                                    numerical hallucination or element repetition.
                                </p>
                            </div>
                            <img src="static/images/case_plot_img_vs_code.png" width="100%">
                        </div>
                        <div class="item">
                            <div class="content has-text-justified is-size-6" style="padding: 1rem;">
                                <p>
                                    <b>Failure Cases.</b> The primary failure mode involves connection errors, such as
                                    redundant connections and mismatched source-target nodes. Our preliminary analysis
                                    reveals that the critic model often fails to identify these connectivity issues,
                                    suggesting these errors may originate from the foundation model's inherent
                                    perception limitations.
                                </p>
                            </div>
                            <img src="static/images/case_failure.png" width="100%">
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </section>



    <section class="section" id="BibTeX">
        <div class="container is-max-desktop content">
            <h2 class="title">BibTeX</h2>
            <pre><code>
            </code></pre>
        </div>
    </section>


    <footer class="footer">
        <div class="container">
            <div class="content has-text-centered">
                <!-- Replace with your paper's link -->
                <a class="icon-link" target="_blank" href="https://arxiv.org/abs/2601.23265">
                    <i class="fas fa-file-pdf"></i>
                </a>
                <!-- Replace with your code's link -->
                <a class="icon-link" href="https://github.com/dwzhu-pku/PaperBanana" target="_blank"
                    class="external-link">
                    <i class="fab fa-github"></i>
                </a>
            </div>
            <div class="columns is-centered">
                <div class="column is-8">
                    <div class="content">
                        <p>
                            This website is licensed under a <a rel="license" target="_blank"
                                href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
                                Commons Attribution-ShareAlike 4.0 International License</a>.
                        </p>
                        <p>
                            This means you are free to borrow the <a target="_blank"
                                href="https://github.com/nerfies/nerfies.github.io">source code</a> of this website,
                            we just ask that you link back to this page in the footer.
                            Please remember to remove the analytics code included in the header of the website which
                            you do not want on your website.
                        </p>
                    </div>
                </div>
            </div>
        </div>
    </footer>


</body>

</html>