PaperBanana

Sleeping

dwzhu

Initial deployment: Gradio app + PaperBananaBench data

587f33e 2 months ago

23.7 kB

	<!DOCTYPE html>
	<html>

	<head>
	<meta charset="utf-8">
	<meta name="description" content="PaperBanana: Automating Academic Illustration for AI Scientists">
	<meta name="keywords"
	content="PaperBanana, Academic Illustration, Diagram Generation, Statistical Plots, Multi-Agent Framework, Gemini">
	<meta name="viewport" content="width=device-width, initial-scale=1">
	<title>PaperBanana: Automating Academic Illustration for AI Scientists</title>

	<link href="https://fonts.googleapis.com/css?family=Google+Sans\|Noto+Sans\|Castoro" rel="stylesheet">

	<link rel="stylesheet" href="./static/css/bulma.min.css">
	<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
	<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css">
	<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
	<link rel="stylesheet" href="./static/css/index.css">

	<link rel="icon" href="./static/images/logo.ico">

	<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
	<script src="./static/js/bulma-carousel.min.js"></script>
	<script src="./static/js/bulma-slider.min.js"></script>
	</head>

	<body>

	<section class="hero">
	<div class="hero-body">
	<div class="container is-max-widescreen">
	<div class="columns is-centered">
	<div class="column has-text-centered">
	<!-- <h1 class="title is-1 publication-title">
	<img src="static/images/logo.png" style="width:1.5em;vertical-align: middle">
	<span>DocLens</span>
	</h1> -->
	<h1 class="subtitle is-3 publication-title">
	<img src="static/images/logo.jpg"
	style="width:1.5em;vertical-align: middle; border-radius: 50%;">
	<b>PaperBanana:</b> Automating Academic Illustration for AI Scientists
	</h1>
	<div class="is-size-5 publication-authors">
	<span class="author-block">
	<a href="https://scholar.google.com/citations?user=oD2HPaYAAAAJ&hl=en"
	target="_blank">Dawei Zhu</a><sup>1,2*</sup>,
	</span>
	<span class="author-block">
	<a href="https://scholar.google.com/citations?user=s6h8L_UAAAAJ&hl=en&oi=ao"
	target="_blank">Rui Meng</a><sup>2</sup>,
	</span>
	<span class="author-block">
	<a href="https://scholar.google.com/citations?user=dNHNpxoAAAAJ&hl=en&oi=ao"
	target="_blank">Yale Song</a><sup>2</sup>,
	</span>
	<span class="author-block">
	<a href="https://scholar.google.com/citations?user=J_RHFhUAAAAJ&hl=en&oi=ao"
	target="_blank">Xiyu Wei</a><sup>1</sup>,
	</span>
	<span class="author-block">
	<a href="https://scholar.google.com/citations?user=RvBDhSwAAAAJ&hl=en&oi=ao"
	target="_blank">Sujian Li</a><sup>1</sup>,
	</span>
	<span class="author-block">
	<a href="https://scholar.google.com/citations?user=ahSpJOAAAAAJ&hl=en&oi=ao"
	target="_blank">Tomas Pfister</a><sup>2</sup>,
	</span>
	<span class="author-block">
	<a href="https://scholar.google.com/citations?user=kiFd6A8AAAAJ&hl=en&oi=ao"
	target="_blank">Jinsung Yoon</a><sup>2</sup>
	</span>
	</div>

	<br>

	<div class="is-size-5 publication-authors">
	<span class="author-block"><sup>1</sup>Peking University</span>    <br>
	<span class="author-block"><sup>2</sup>Google Cloud AI Research</span>
	</div>

	<br>

	<div class="is-size-5 thanks">
	<span class="author-block">Corresponding author(s):</span>
	<span class="author-block"><a href="mailto:dwzhu@pku.edu.cn">dwzhu@pku.edu.cn</a>,</span>
	<span class="author-block"><a
	href="mailto:lisujian@pku.edu.cn">lisujian@pku.edu.cn</a>,</span>
	<span class="author-block"><a
	href="mailto:jinsungyoon@google.com">jinsungyoon@google.com</a></span>
	</div>

	<div class="column has-text-centered">
	<div class="publication-links">
	<span class="link-block">
	<!-- Replace with your actual arXiv link -->
	<a href="https://arxiv.org/abs/2601.23265" target="_blank"
	class="external-link button is-normal is-rounded is-dark">
	<span class="icon">
	<i class="ai ai-arxiv"></i>
	</span>
	<span>arXiv</span>
	</a>
	</span>
	<!-- Code Link. -->
	<span class="link-block">
	<a href="https://github.com/dwzhu-pku/PaperBanana" target="_blank"
	class="external-link button is-normal is-rounded is-dark">
	<span class="icon">
	<i class="fab fa-github"></i>
	</span>
	<span>Code</span>
	</a>
	</span>
	<!-- Twitter Link. -->
	<span class="link-block">
	<a href="https://x.com/dwzhu128/status/2018405593976103010"
	class="external-link button is-normal is-rounded is-dark">
	<span class="icon has-text-white">
	<i class="fa-brands fa-x-twitter"></i>
	</span>
	<span>Twitter</span>
	</a>
	</span>
	</div>

	</div>
	</div>
	</div>
	</div>
	</div>
	</section>

	<section class="hero teaser">
	<div class="container is-max-desktop">
	<div class="content has-text-centered">
	<img src="static/images/teaser_figure.jpg" alt="PaperBanana Workflow" width="95%" />
	<p class="is-size-6">
	Examples of methodology diagrams and statistical plots generated by <b>PaperBanana</b>, which show
	the potential of automating the generation of academic illustrations.
	</p>
	</div>
	</div>
	</section>


	<section class="section hero is-light">
	<div class="container is-max-desktop">
	<!-- Abstract -->
	<div class="columns is-centered has-text-centered">
	<div class="column is-four-fifths">
	<!-- <h2 class="title is-3">Abstract</h2> -->
	<div class="content has-text-justified">
	<p>
	Despite rapid advances in autonomous AI scientists powered by language models, generating
	publication-ready illustrations remains a labor-intensive bottleneck in the research
	workflow.
	To lift this burden, we introduce <b>PaperBanana</b>, an agentic framework for automated
	generation of publication-ready academic illustrations.
	Powered by state-of-the-art VLMs and image generation models,
	PaperBanana orchestrates specialized agents to retrieve references, plan content and style,
	render images, and iteratively refine via self-critique.
	To rigorously evaluate our framework, we introduce <b>PaperBananaBench</b>, comprising 292
	test cases for methodology diagrams curated from NeurIPS 2025 publications, covering diverse
	research domains and illustration styles.
	Comprehensive experiments demonstrate that PaperBanana consistently outperforms leading
	baselines in faithfulness, conciseness, readability, and aesthetics.
	We further show that our method effectively extends to the generation of high-quality
	statistical plots.
	Collectively,
	PaperBanana paves the way for the automated generation of publication-ready illustrations.
	</p>
	</div>
	</div>
	</div>

	</div>
	</section>

	<section class="section">
	<div class="container is-max-desktop">
	<div class="columns is-centered has-text-centered">
	<div class="column is-four-fifths">
	<h2 class="title is-3">Method Overview</h2>
	<div class="content has-text-justified">
	<p>
	We propose PaperBanana, a reference-driven agentic framework for automated academic
	illustration. As illustrated in the diagram (<b>generated by <img
	src="static/images/logo.jpg"
	style="width:1.0em;vertical-align: middle; border-radius: 50%;"></b>) below,
	PaperBanana
	orchestrates a collaborative
	team of five specialized agents—Retriever, Planner, Stylist, Visualizer, and Critic—to
	transform raw scientific content into publication-quality diagrams and plots.
	</p>
	</div>
	<img id="model" width="100%" src="static/images/method_diagram.png">

	<br>

	<div class="content has-text-justified">
	<ul>
	<li><strong>Retriever Agent</strong>: Identifies relevant reference examples to guide
	downstream agents.</li>
	<li><strong>Planner Agent</strong>: Acts as the cognitive core, translating context into
	detailed textual descriptions.</li>
	<li><strong>Stylist Agent</strong>: Ensures adherance to academic aesthetic standards by
	synthesizing guidelines from references.</li>
	<li><strong>Visualizer Agent</strong>: Transforms textual descriptions into visual output or
	executable code.</li>
	<li><strong>Critic Agent</strong>: Inspects generated images/plots against the source to
	provide feedback for refinement.</li>
	</ul>
	</div>
	</div>
	</div>
	</div>
	</section>

	<section class="section">
	<div class="container is-max-desktop">
	<div class="columns is-centered has-text-centered">
	<div class="column is-four-fifths">
	<h2 class="title is-3">Benchmark Construction</h2>
	<div class="content has-text-justified">
	<p>
	The lack of benchmarks hinders rigorous evaluation of automated diagram generation.
	We address this with <b>PaperBananaBench</b>, a dedicated benchmark curated from NeurIPS
	2025 methodology diagrams,
	capturing the sophisticated aesthetics and diverse logical compositions of modern AI papers.
	The construction pipeline ensures high quality through: (1) Collection & Parsing, (2)
	Filtering, (3) Categorization, and (4) Human Curation. The final dataset comprises 584
	valid samples, partitioned into 292 test and 292 reference cases.
	</p>
	</div>
	<img src="static/images/plot_bench_stat.jpg" width="60%">
	<p class="is-size-7 has-text-centered">
	[Plot generated by <img src="static/images/logo.jpg"
	style="width:1.0em;vertical-align: middle; border-radius: 50%;"> from raw data] Statistics
	of the test set
	of PaperBananaBench (totaling 292 samples). The average length of source context / figure
	caption is 3,020.1 / 70.4 words.
	</p>
	</div>
	</div>
	</div>
	</section>




	<section class="section">
	<div class="container is-max-desktop">
	<div class="columns is-centered has-text-centered">
	<div class="column is-four-fifths">
	<h2 class="title is-3">Experimental Results</h2>


	<!-- <h3 class="title is-4">Main Results</h3> -->
	<div class="content has-text-justified">
	<p>
	We evaluate PaperBanana on <b>PaperBananaBench</b>, assessing performance across
	faithfulness, conciseness, readability, and aesthetics. Our method consistently
	outperforms leading baselines across all four evaluation dimensions.
	</p>
	</div>
	<img src="static/images/main_results.png" alt="Main Results of PaperBanana" width="100%">
	<br>
	<br>
	<br>
	<div class="content has-text-justified">
	<p>
	We further show that our method also seamlessly extends to the generation of high-quality
	statistical plots.
	The plot below was itself <b>generated by <img src="static/images/logo.jpg"
	style="width:1.0em;vertical-align: middle; border-radius: 50%;"></b> from our raw
	data.
	</p>
	</div>
	<img src="static/images/plot_vanilla_vs_ours_v2.jpg" alt="Statistical Plots Comparison" width="80%">
	</div>
	</div>
	</div>
	</section>

	<section class="section">
	<div class="container is-max-desktop">
	<div class="columns is-centered has-text-centered">
	<div class="column is-four-fifths">
	<h2 class="title is-3">Two Advanced Applications</h2>


	<div class="content has-text-justified">
	<p>
	<b>1. Enhancing Aesthetics of Human-Drawn Diagrams.</b> We explore using our summarized
	aesthetic guidelines to elevate the aesthetic quality of
	existing human-drawn diagrams. Below is an example:
	</p>
	</div>
	<img src="static/images/discussion_enhance_style.jpg" width="80%">
	<br>

	<div class="content has-text-justified">
	<p>
	<b>2. Coding vs Image Generation for Visualizing Statistical Plots.</b> We explore using
	image generation models for statistical plots generation, comparing with code-based
	approaches. The results below reveal distinct trade-offs: image generation excels in
	presentation but underperforms in content fidelity. [The plot below was itself generated by
	<img src="static/images/logo.jpg"
	style="width:1.0em;vertical-align: middle; border-radius: 50%;">, from our raw data]
	</p>
	</div>
	<img src="static/images/plot_code_vs_img.jpg" alt="Code vs Image Generation" width="80%">
	</div>
	</div>
	</div>
	</section>


	<section class="section">
	<div class="container is-max-desktop">
	<div class="columns is-centered has-text-centered">
	<div class="column is-full">
	<h2 class="title is-3">Case Study</h2>
	<div class="content has-text-justified">
	</div>
	<div id="results-carousel" class="carousel results-carousel" data-slides-to-scroll="1">
	<div class="item">
	<div class="content has-text-justified is-size-6" style="padding: 1rem;">
	<p>
	<b>Case Study of Diagram Generation.</b> Given the same source context and caption,
	the vanilla Nano-Banana-Pro often produces diagrams with outdated color tones and
	overly verbose content. In contrast, our PaperBanana generates results that are more
	concise and aesthetically pleasing, while maintaining faithfulness to the source
	context.
	</p>
	</div>
	<img src="static/images/case_diagram.png" width="100%">
	</div>
	<div class="item">
	<div class="content has-text-justified is-size-6" style="padding: 1rem;">
	<p>
	<b>Enhancing Aesthetics.</b> Additional cases for enhancing the aesthetics of
	human-drawn diagrams with our auto-summarized style guidelines. The polished
	diagrams demonstrate significant stylistic improvements in color schemes,
	typography, graphical elements, etc.
	</p>
	</div>
	<img src="static/images/case_style_polish.png" width="100%">
	</div>
	<div class="item">
	<div class="content has-text-justified is-size-6" style="padding: 1rem;">
	<p>
	<b>Visualizing Statistical Plots.</b> Case study for visualizing statistical plots
	with code and image generation. It is observed that the image generation model can
	generate more visually appealing plots, but incurs more faithfulness errors such as
	numerical hallucination or element repetition.
	</p>
	</div>
	<img src="static/images/case_plot_img_vs_code.png" width="100%">
	</div>
	<div class="item">
	<div class="content has-text-justified is-size-6" style="padding: 1rem;">
	<p>
	<b>Failure Cases.</b> The primary failure mode involves connection errors, such as
	redundant connections and mismatched source-target nodes. Our preliminary analysis
	reveals that the critic model often fails to identify these connectivity issues,
	suggesting these errors may originate from the foundation model's inherent
	perception limitations.
	</p>
	</div>
	<img src="static/images/case_failure.png" width="100%">
	</div>
	</div>
	</div>
	</div>
	</div>
	</section>



	<section class="section" id="BibTeX">
	<div class="container is-max-desktop content">
	<h2 class="title">BibTeX</h2>
	<pre><code>
	</code></pre>
	</div>
	</section>


	<footer class="footer">
	<div class="container">
	<div class="content has-text-centered">
	<!-- Replace with your paper's link -->
	<a class="icon-link" target="_blank" href="https://arxiv.org/abs/2601.23265">
	<i class="fas fa-file-pdf"></i>
	</a>
	<!-- Replace with your code's link -->
	<a class="icon-link" href="https://github.com/dwzhu-pku/PaperBanana" target="_blank"
	class="external-link">
	<i class="fab fa-github"></i>
	</a>
	</div>
	<div class="columns is-centered">
	<div class="column is-8">
	<div class="content">
	<p>
	This website is licensed under a <a rel="license" target="_blank"
	href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
	Commons Attribution-ShareAlike 4.0 International License</a>.
	</p>
	<p>
	This means you are free to borrow the <a target="_blank"
	href="https://github.com/nerfies/nerfies.github.io">source code</a> of this website,
	we just ask that you link back to this page in the footer.
	Please remember to remove the analytics code included in the header of the website which
	you do not want on your website.
	</p>
	</div>
	</div>
	</div>
	</div>
	</footer>


	</body>

	</html>