Spaces:
Running
Running
| % Unofficial University of Cambridge Poster Template | |
| % https://github.com/andiac/gemini-cam | |
| % a fork of https://github.com/anishathalye/gemini | |
| % also refer to https://github.com/k4rtik/uchicago-poster | |
| \documentclass[final]{beamer} | |
| % ==================== | |
| % Packages | |
| % ==================== | |
| \usepackage[T1]{fontenc} | |
| \usepackage{lmodern} | |
| \usepackage[size=custom,width=120,height=72,scale=1.0]{beamerposter} | |
| \usetheme{gemini} | |
| \usecolortheme{cam} | |
| \usepackage{graphicx} | |
| \usepackage{booktabs} | |
| \usepackage[numbers]{natbib} | |
| \usepackage{tikz} | |
| \usepackage{pgfplots} | |
| \pgfplotsset{compat=1.14} | |
| \usepackage{anyfontsize} | |
| \definecolor{nipspurple}{RGB}{94,46,145} | |
| \setbeamercolor{headline}{bg=white, fg=black} | |
| \setbeamercolor{block title}{bg=nipspurple, fg=white} | |
| \addtobeamertemplate{block begin}{ | |
| \setlength{\textpaddingtop}{0.2em}% | |
| \setlength{\textpaddingbottom}{0.2em}% | |
| }{} | |
| % ==================== | |
| % Lengths | |
| % ==================== | |
| % If you have N columns, choose \sepwidth and \colwidth such that | |
| % (N+1)*\sepwidth + N*\colwidth = \paperwidth | |
| \newlength{\sepwidth} | |
| \newlength{\colwidth} | |
| \setlength{\sepwidth}{0.025\paperwidth} | |
| \setlength{\colwidth}{0.3\paperwidth} | |
| \newcommand{\separatorcolumn}{\begin{column}{\sepwidth}\end{column}} | |
| % ==================== | |
| % Title | |
| % ==================== | |
| \title{Paper2Poster: \ Towards Multimodal Poster Automation from Scientific Papers} | |
| \author{Wei Pang\textsuperscript{1}, Kevin Qinghong Lin\textsuperscript{2}, Xiangru Jian\textsuperscript{1}, Xi He\textsuperscript{1}, Philip Torr\textsuperscript{3}} | |
| \institute[shortinst]{1 University of Waterloo; 2 National University of Singapore; 3 University of Oxford} | |
| % ==================== | |
| % Footer (optional) | |
| % ==================== | |
| \footercontent{ | |
| \href{https://paper2poster.github.io/}{https://paper2poster.github.io/} \hfill | |
| Generated by Paper2Poster \hfill | |
| } | |
| % (can be left out to remove footer) | |
| % ==================== | |
| % Logo (optional) | |
| % ==================== | |
| % use this to include logos on the left and/or right side of the header: | |
| \logoright{\includegraphics[height=5cm]{logos/right_logo.png}} | |
| \logoleft{\includegraphics[height=4cm]{logos/left_logo.png}} | |
| % ==================== | |
| % Body | |
| % ==================== | |
| % --- injected font tweaks --- | |
| \setbeamerfont{title}{size=\huge} | |
| \setbeamerfont{author}{size=\Large} | |
| \setbeamerfont{institute}{size=\large} | |
| \setbeamerfont{block title}{size=\Large} | |
| \setbeamerfont{block body}{size=\large} | |
| \begin{document} | |
| % Refer to https://github.com/k4rtik/uchicago-poster | |
| % logo: https://www.cam.ac.uk/brand-resources/about-the-logo/logo-downloads | |
| \addtobeamertemplate{headline}{} | |
| { | |
| \begin{tikzpicture}[remember picture,overlay] | |
| \node [anchor=north west, inner sep=3cm] at ([xshift=0.0cm,yshift=1.0cm]current page.north west) | |
| \end{tikzpicture} | |
| } | |
| \begin{frame}[t] | |
| \begin{columns}[t] | |
| \separatorcolumn | |
| \begin{column}{\colwidth} | |
| \begin{block}{Why Posters Are Hard} | |
| We tackle \textbf{single-page multimodal compression}: dense papers must become legible posters with \textcolor{red}{tight spatial constraints}. Pure LLM or VLM approaches \textbf{struggle with layout}, missing \textit{reading order} and \textbf{overflow control}. We reveal \textcolor{blue}{visual-in-the-loop} planning is key to \textbf{clarity}, \textbf{balance}, and \textbf{engagement}. | |
| \begin{figure} | |
| \centering | |
| \includegraphics[width=0.80\linewidth]{figures/paper-picture-1.png} | |
| \end{figure} | |
| \end{block} | |
| \begin{block}{Benchmark \& Task} | |
| We introduce \textbf{Paper2Poster} and the task: generate a \textbf{single-page}, well-balanced poster that faithfully conveys core ideas. The protocol measures \textit{what matters}: \textbf{visual alignment}, \textbf{text fluency}, \textbf{holistic quality}, and knowledge transfer via \textcolor{blue}{PaperQuiz}. Our setup \textbf{standardizes evaluation} for automated poster generation. | |
| \end{block} | |
| \begin{block}{Curated Diverse Dataset} | |
| Dataset spans \textcolor{blue}{100} paper–poster pairs (NeurIPS, ICML, ICLR). Papers average \textcolor{blue}{22.6} pages and \textcolor{blue}{20K+} tokens; posters average \textcolor{blue}{1.4K} tokens. We observe \textbf{14.4x} text compression and \textbf{2.6x} figure reduction. Coverage: CV (\textcolor{blue}{19\%}), NLP (\textcolor{blue}{17\%}), RL (\textcolor{blue}{10\%})—driving \textbf{robustness}. | |
| \begin{figure} | |
| \centering | |
| \includegraphics[width=0.80\linewidth]{figures/paper-picture-6.png} | |
| \end{figure} | |
| \end{block} | |
| \end{column} | |
| \separatorcolumn | |
| \begin{column}{\colwidth} | |
| \begin{block}{Four-Pronged Evaluation} | |
| Our \textbf{four-pronged} suite tests end-to-end quality: Visual Quality via \textcolor{blue}{AltCLIP} similarity and \textbf{figure relevance}; Textual Coherence via \textcolor{blue}{PPL} (Llama-2-7B); VLM-as-Judge across \textbf{6 criteria}; and \textcolor{blue}{PaperQuiz} with length-aware penalties rewarding \textbf{dense, readable} designs. | |
| \begin{figure} | |
| \centering | |
| \includegraphics[width=0.80\linewidth]{figures/paper-picture-7.png} | |
| \end{figure} | |
| \end{block} | |
| \begin{block}{PosterAgent Pipeline} | |
| PosterAgent is \textbf{top-down, visual-in-the-loop}. \textit{Parser} builds a semantic asset library; \textit{Planner} aligns text–visual pairs and uses \textcolor{blue}{binary-tree} layouts to preserve \textbf{reading order}. \textit{Painter-Commenter} renders panels, applies \textcolor{blue}{zoom-in} VLM feedback, and fixes \textbf{overflow} and \textbf{alignment}—yielding concise, coherent posters. | |
| \begin{figure} | |
| \centering | |
| \includegraphics[width=0.80\linewidth]{figures/paper-picture-8.png} | |
| \end{figure} | |
| \end{block} | |
| \begin{block}{Main Results} | |
| Across metrics, \textbf{PosterAgent} variants beat multi-agent baselines. We attain \textcolor{blue}{state-leading figure relevance} and near-\textbf{human} visual similarity. GPT-4o pixel posters look good but show \textcolor{red}{noisy text} and high \textcolor{red}{PPL}. VLM-as-Judge scores place PosterAgent-4o at \textcolor{blue}{3.72} overall, approaching GT posters. | |
| \begin{figure} | |
| \centering | |
| \includegraphics[width=0.80\linewidth]{figures/paper-table-1.png} | |
| \end{figure} | |
| \end{block} | |
| \end{column} | |
| \separatorcolumn | |
| \begin{column}{\colwidth} | |
| \begin{block}{PaperQuiz Insights} | |
| \textcolor{blue}{PaperQuiz} tracks human judgment and rewards \textbf{informative brevity}. With penalties, GT posters lead; \textbf{PosterAgent} tops automated methods. Open-source \textcolor{blue}{Qwen-2.5} stacks stay \textbf{competitive}. Stronger reader VLMs exploit \textbf{structured layouts}, outperforming blog-like or \textcolor{red}{text-garbling} image generations. | |
| \begin{figure} | |
| \centering | |
| \includegraphics[width=0.80\linewidth]{figures/paper-picture-9.png} | |
| \end{figure} | |
| \end{block} | |
| \begin{block}{Efficient, Open, Scalable} | |
| Our pipeline slashes tokens by \textcolor{blue}{60–87\%}. PosterAgent-4o uses \textcolor{blue}{101K} tokens (\textcolor{blue}{\$0.55}); PosterAgent-Qwen uses \textcolor{blue}{47.6K} (\textcolor{blue}{\$0.0045}). Runtime ≈ \textcolor{blue}{4.5 min}. \textcolor{red}{Bottleneck}: sequential panel refinement; \textbf{future} parallelism, external knowledge, and human-in-the-loop will boost \textbf{engagement}. | |
| \begin{figure} | |
| \centering | |
| \includegraphics[width=0.80\linewidth]{figures/paper-table-8.png} | |
| \end{figure} | |
| \end{block} | |
| \end{column} | |
| \separatorcolumn | |
| \end{columns} | |
| \end{frame} | |
| \end{document} | |