Paper2Poster

Running

Paper2Poster / posterbuilder /latex_proj /poster_output.tex

ZaynZhu

Migrate Paper2Poster code

af8fa93 about 1 month ago

7.32 kB

	% Unofficial University of Cambridge Poster Template
	% https://github.com/andiac/gemini-cam
	% a fork of https://github.com/anishathalye/gemini
	% also refer to https://github.com/k4rtik/uchicago-poster

	\documentclass[final]{beamer}

	% ====================
	% Packages
	% ====================

	\usepackage[T1]{fontenc}
	\usepackage{lmodern}
	\usepackage[size=custom,width=120,height=72,scale=1.0]{beamerposter}
	\usetheme{gemini}
	\usecolortheme{cam}
	\usepackage{graphicx}
	\usepackage{booktabs}
	\usepackage[numbers]{natbib}
	\usepackage{tikz}
	\usepackage{pgfplots}
	\pgfplotsset{compat=1.14}
	\usepackage{anyfontsize}

	\definecolor{nipspurple}{RGB}{94,46,145}
	\setbeamercolor{headline}{bg=white, fg=black}
	\setbeamercolor{block title}{bg=nipspurple, fg=white}
	\addtobeamertemplate{block begin}{
	\setlength{\textpaddingtop}{0.2em}%
	\setlength{\textpaddingbottom}{0.2em}%
	}{}
	% ====================
	% Lengths
	% ====================

	% If you have N columns, choose \sepwidth and \colwidth such that
	% (N+1)\sepwidth + N\colwidth = \paperwidth
	\newlength{\sepwidth}
	\newlength{\colwidth}
	\setlength{\sepwidth}{0.025\paperwidth}
	\setlength{\colwidth}{0.3\paperwidth}

	\newcommand{\separatorcolumn}{\begin{column}{\sepwidth}\end{column}}

	% ====================
	% Title
	% ====================

	\title{Paper2Poster: \ Towards Multimodal Poster Automation from Scientific Papers}

	\author{Wei Pang\textsuperscript{1}, Kevin Qinghong Lin\textsuperscript{2}, Xiangru Jian\textsuperscript{1}, Xi He\textsuperscript{1}, Philip Torr\textsuperscript{3}}

	\institute[shortinst]{1 University of Waterloo; 2 National University of Singapore; 3 University of Oxford}

	% ====================
	% Footer (optional)
	% ====================

	\footercontent{
	\href{https://paper2poster.github.io/}{https://paper2poster.github.io/} \hfill
	Generated by Paper2Poster \hfill
	}
	% (can be left out to remove footer)

	% ====================
	% Logo (optional)
	% ====================

	% use this to include logos on the left and/or right side of the header:
	\logoright{\includegraphics[height=5cm]{logos/right_logo.png}}
	\logoleft{\includegraphics[height=4cm]{logos/left_logo.png}}

	% ====================
	% Body
	% ====================


	% --- injected font tweaks ---
	\setbeamerfont{title}{size=\huge}
	\setbeamerfont{author}{size=\Large}
	\setbeamerfont{institute}{size=\large}
	\setbeamerfont{block title}{size=\Large}
	\setbeamerfont{block body}{size=\large}
	\begin{document}

	% Refer to https://github.com/k4rtik/uchicago-poster
	% logo: https://www.cam.ac.uk/brand-resources/about-the-logo/logo-downloads
	\addtobeamertemplate{headline}{}
	{
	\begin{tikzpicture}[remember picture,overlay]
	\node [anchor=north west, inner sep=3cm] at ([xshift=0.0cm,yshift=1.0cm]current page.north west)
	\end{tikzpicture}
	}

	\begin{frame}[t]
	\begin{columns}[t]
	\separatorcolumn
	\begin{column}{\colwidth}
	\begin{block}{Why Posters Are Hard}
	We tackle \textbf{single-page multimodal compression}: dense papers must become legible posters with \textcolor{red}{tight spatial constraints}. Pure LLM or VLM approaches \textbf{struggle with layout}, missing \textit{reading order} and \textbf{overflow control}. We reveal \textcolor{blue}{visual-in-the-loop} planning is key to \textbf{clarity}, \textbf{balance}, and \textbf{engagement}.

	\begin{figure}
	\centering
	\includegraphics[width=0.80\linewidth]{figures/paper-picture-1.png}
	\end{figure}

	\end{block}

	\begin{block}{Benchmark \& Task}
	We introduce \textbf{Paper2Poster} and the task: generate a \textbf{single-page}, well-balanced poster that faithfully conveys core ideas. The protocol measures \textit{what matters}: \textbf{visual alignment}, \textbf{text fluency}, \textbf{holistic quality}, and knowledge transfer via \textcolor{blue}{PaperQuiz}. Our setup \textbf{standardizes evaluation} for automated poster generation.
	\end{block}

	\begin{block}{Curated Diverse Dataset}
	Dataset spans \textcolor{blue}{100} paper–poster pairs (NeurIPS, ICML, ICLR). Papers average \textcolor{blue}{22.6} pages and \textcolor{blue}{20K+} tokens; posters average \textcolor{blue}{1.4K} tokens. We observe \textbf{14.4x} text compression and \textbf{2.6x} figure reduction. Coverage: CV (\textcolor{blue}{19\%}), NLP (\textcolor{blue}{17\%}), RL (\textcolor{blue}{10\%})—driving \textbf{robustness}.

	\begin{figure}
	\centering
	\includegraphics[width=0.80\linewidth]{figures/paper-picture-6.png}
	\end{figure}

	\end{block}

	\end{column}
	\separatorcolumn
	\begin{column}{\colwidth}
	\begin{block}{Four-Pronged Evaluation}
	Our \textbf{four-pronged} suite tests end-to-end quality: Visual Quality via \textcolor{blue}{AltCLIP} similarity and \textbf{figure relevance}; Textual Coherence via \textcolor{blue}{PPL} (Llama-2-7B); VLM-as-Judge across \textbf{6 criteria}; and \textcolor{blue}{PaperQuiz} with length-aware penalties rewarding \textbf{dense, readable} designs.

	\begin{figure}
	\centering
	\includegraphics[width=0.80\linewidth]{figures/paper-picture-7.png}
	\end{figure}

	\end{block}

	\begin{block}{PosterAgent Pipeline}
	PosterAgent is \textbf{top-down, visual-in-the-loop}. \textit{Parser} builds a semantic asset library; \textit{Planner} aligns text–visual pairs and uses \textcolor{blue}{binary-tree} layouts to preserve \textbf{reading order}. \textit{Painter-Commenter} renders panels, applies \textcolor{blue}{zoom-in} VLM feedback, and fixes \textbf{overflow} and \textbf{alignment}—yielding concise, coherent posters.

	\begin{figure}
	\centering
	\includegraphics[width=0.80\linewidth]{figures/paper-picture-8.png}
	\end{figure}

	\end{block}

	\begin{block}{Main Results}
	Across metrics, \textbf{PosterAgent} variants beat multi-agent baselines. We attain \textcolor{blue}{state-leading figure relevance} and near-\textbf{human} visual similarity. GPT-4o pixel posters look good but show \textcolor{red}{noisy text} and high \textcolor{red}{PPL}. VLM-as-Judge scores place PosterAgent-4o at \textcolor{blue}{3.72} overall, approaching GT posters.

	\begin{figure}
	\centering
	\includegraphics[width=0.80\linewidth]{figures/paper-table-1.png}
	\end{figure}

	\end{block}

	\end{column}
	\separatorcolumn
	\begin{column}{\colwidth}
	\begin{block}{PaperQuiz Insights}
	\textcolor{blue}{PaperQuiz} tracks human judgment and rewards \textbf{informative brevity}. With penalties, GT posters lead; \textbf{PosterAgent} tops automated methods. Open-source \textcolor{blue}{Qwen-2.5} stacks stay \textbf{competitive}. Stronger reader VLMs exploit \textbf{structured layouts}, outperforming blog-like or \textcolor{red}{text-garbling} image generations.

	\begin{figure}
	\centering
	\includegraphics[width=0.80\linewidth]{figures/paper-picture-9.png}
	\end{figure}

	\end{block}

	\begin{block}{Efficient, Open, Scalable}
	Our pipeline slashes tokens by \textcolor{blue}{60–87\%}. PosterAgent-4o uses \textcolor{blue}{101K} tokens (\textcolor{blue}{\$0.55}); PosterAgent-Qwen uses \textcolor{blue}{47.6K} (\textcolor{blue}{\$0.0045}). Runtime ≈ \textcolor{blue}{4.5 min}. \textcolor{red}{Bottleneck}: sequential panel refinement; \textbf{future} parallelism, external knowledge, and human-in-the-loop will boost \textbf{engagement}.

	\begin{figure}
	\centering
	\includegraphics[width=0.80\linewidth]{figures/paper-table-8.png}
	\end{figure}

	\end{block}

	\end{column}
	\separatorcolumn
	\end{columns}
	\end{frame}

	\end{document}