Spaces:

JaceWei
/

PaperShow

Running

App Files Files Community

PaperShow / Paper2Poster /README.md

ZaynZhu

Clean version without large assets

7c08dc3 about 2 months ago

preview code

raw

history blame contribute delete

8.73 kB

	# 🎓Paper2Poster: Multimodal Poster Automation from Scientific Papers
	# 从学术论文自动生成学术海报

	<p align="center">
	<a href="https://arxiv.org/abs/2505.21497" target="_blank"><img src="https://img.shields.io/badge/arXiv-2505.21497-red"></a>
	<a href="https://paper2poster.github.io/" target="_blank"><img src="https://img.shields.io/badge/Project-Page-brightgreen"></a>
	<a href="https://huggingface.co/datasets/Paper2Poster/Paper2Poster" target="_blank"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-orange"></a>
	<a href="https://huggingface.co/papers/2505.21497" target="_blank"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Daily Papers-red"></a>
	<a href="https://x.com/_akhaliq/status/1927721150584390129" target="_blank"><img alt="X (formerly Twitter) URL" src="https://img.shields.io/twitter/url?url=https%3A%2F%2Fx.com%2F_akhaliq%2Fstatus%2F1927721150584390129"></a>
	</p>

	We address How to create a poster from a paper and How to evaluate poster.

	![Overview](./assets/overall.png)


	## 🔥 Update
	- [x] [2025.10.13] Added automatic logo support for conferences and institutions, YAML-based style customization, a new default theme.
	- [x] [2025.9.18] Paper2Poster has been accepted to NeurIPS 2025 Dataset and Benchmark Track.
	- [x] [2025.9.3] We now support generate per section content in parallel for faster generation, by simply specifying `--max_workers`.
	- [x] [2025.5.27] We release the [arXiv](https://arxiv.org/abs/2505.21497), [code](https://github.com/Paper2Poster/Paper2Poster) and [`dataset`](https://huggingface.co/datasets/Paper2Poster/Paper2Poster)

	<!--## 📚 Introduction-->

	PosterAgent is a top-down, visual-in-the-loop multi-agent system from `paper.pdf` to editable `poster.pptx`.

	![PosterAgent Overview](./assets/posteragent.png)

	<!--A Top-down, visual-in-the-loop, efficient multi-agent pipeline, which includes (a) Parser distills the paper into a structured asset library; the (b) Planner aligns text–visual pairs into a binary‐tree layout that preserves reading order and spatial balance; and the (c) Painter-Commentor loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment.-->

	<!--![Paper2Poster Overview](./assets/paperquiz.png)-->

	<!--Paper2Poster: A benchmark for paper to poster generation, paired with human generated poster, with a comprehensive evaluation suite, including metrics like Visual Quality, Textual Coherence, VLM-as-Judge and PaperQuiz. Notably, PaperQuiz is a novel evaluation which assume A Good poster should convey core paper content visually.-->

	## 📋 Table of Contents

	<!--- [📚 Introduction](#-introduction)-->
	- [🛠️ Installation](#-installation)
	- [🚀 Quick Start](#-quick-start)
	- [🔮 Evaluation](#-evaluation)
	---

	## 🛠️ Installation
	Our Paper2Poster supports both local deployment (via [vLLM](https://docs.vllm.ai/en/v0.6.6/getting_started/installation.html)) or API-based access (e.g., GPT-4o).

	Python Environment
	```bash
	pip install -r requirements.txt
	```

	Install Libreoffice
	```bash
	sudo apt install libreoffice
	```

	or, if you do not have sudo access, download `soffice` executable directly: https://www.libreoffice.org/download/download-libreoffice/, and add the executable directory to your `$PATH`.

	Install poppler
	```bash
	conda install -c conda-forge poppler
	```

	API Key

	Create a `.env` file in the project root and add your OpenAI API key:

	```bash
	OPENAI_API_KEY=<your_openai_api_key>
	```

	Optional: Google Search API (for logo search)

	To use Google Custom Search for more reliable logo search, add these to your `.env` file:

	```bash
	GOOGLE_SEARCH_API_KEY=<your_google_search_api_key>
	GOOGLE_SEARCH_ENGINE_ID=<your_search_engine_id>
	```

	---

	## 🚀 Quick Start
	Create a folder named `{paper_name}` under `{dataset_dir}`, and place your paper inside it as a PDF file named `paper.pdf`.
	```
	📁 {dataset_dir}/
	└── 📁 {paper_name}/
	└── 📄 paper.pdf
	```
	To use open-source models, you need to first deploy them using [vLLM](https://docs.vllm.ai/en/v0.6.6/getting_started/installation.html), ensuring the port is correctly specified in the `get_agent_config()` function in [`utils/wei_utils.py`](utils/wei_utils.py).

	- [High Performance] Generate a poster with `GPT-4o`:

	```bash
	python -m PosterAgent.new_pipeline \
	--poster_path="${dataset_dir}/${paper_name}/paper.pdf" \
	--model_name_t="4o" \ # LLM
	--model_name_v="4o" \ # VLM
	--poster_width_inches=48 \
	--poster_height_inches=36
	```

	- [Economic] Generate a poster with `Qwen-2.5-7B-Instruct` and `GPT-4o`:

	```bash
	python -m PosterAgent.new_pipeline \
	--poster_path="${dataset_dir}/${paper_name}/paper.pdf" \
	--model_name_t="vllm_qwen" \ # LLM
	--model_name_v="4o" \ # VLM
	--poster_width_inches=48 \
	--poster_height_inches=36 \
	--no_blank_detection # An option to disable blank detection
	```

	- [Local] Generate a poster with `Qwen-2.5-7B-Instruct`:

	```bash
	python -m PosterAgent.new_pipeline \
	--poster_path="${dataset_dir}/${paper_name}/paper.pdf" \
	--model_name_t="vllm_qwen" \ # LLM
	--model_name_v="vllm_qwen_vl" \ # VLM
	--poster_width_inches=48 \
	--poster_height_inches=36
	```

	PosterAgent supports flexible combination of LLM / VLM, feel free to try other options, or customize your own settings in `get_agent_config()` in [`utils/wei_utils.py`](utils/wei_utils.py).

	### Adding Logos to Posters

	You can automatically add institutional and conference logos to your posters:

	```bash
	python -m PosterAgent.new_pipeline \
	--poster_path="${dataset_dir}/${paper_name}/paper.pdf" \
	--model_name_t="4o" \
	--model_name_v="4o" \
	--poster_width_inches=48 \
	--poster_height_inches=36 \
	--conference_venue="NeurIPS" # Automatically searches for conference logo
	```

	Logo Search Strategy:
	1. Local search: First checks the provided logo store (`logo_store/institutes/` and `logo_store/conferences/`)
	2. Web search: If not found locally, performs online search
	- By default, uses DuckDuckGo (no API key required)
	- For more reliable results, use `--use_google_search` (requires `GOOGLE_SEARCH_API_KEY` and `GOOGLE_SEARCH_ENGINE_ID` in `.env`)

	You can also specify custom logo paths to skip auto-detection:
	```bash
	--institution_logo_path="path/to/institution_logo.png" \
	--conference_logo_path="path/to/conference_logo.png"
	```

	### YAML Style Customization

	Customize poster appearance via YAML configuration files:
	- Global defaults: `config/poster.yaml` (applies to all posters)
	- Per-poster override: Place `poster.yaml` next to your `paper.pdf` for custom styling


	## 🔮 Evaluation
	Download Paper2Poster evaluation dataset via:
	```bash
	python -m PosterAgent.create_dataset
	```

	In evaluation, papers are stored under a directory called `Paper2Poster-data`.

	To evaluate a generated poster with PaperQuiz:
	```bash
	python -m Paper2Poster-eval.eval_poster_pipeline \
	--paper_name="${paper_name}" \
	--poster_method="${model_t}_${model_v}_generated_posters" \
	--metric=qa # PaperQuiz
	```

	To evaluate a generated poster with VLM-as-Judge:
	```bash
	python -m Paper2Poster-eval.eval_poster_pipeline \
	--paper_name="${paper_name}" \
	--poster_method="${model_t}_${model_v}_generated_posters" \
	--metric=judge # VLM-as-Judge
	```

	To evaluate a generated poster with other statistical metrics (such as visual similarity, PPL, etc):
	```bash
	python -m Paper2Poster-eval.eval_poster_pipeline \
	--paper_name="${paper_name}" \
	--poster_method="${model_t}_${model_v}_generated_posters" \
	--metric=stats # statistical measures
	```

	If you want to create a PaperQuiz for your own paper:
	```bash
	python -m Paper2Poster-eval.create_paper_questions \
	--paper_folder="Paper2Poster-data/${paper_name}"
	```

	## ❤ Acknowledgement
	We extend our gratitude to [🐫CAMEL](https://github.com/camel-ai/camel), [🦉OWL](https://github.com/camel-ai/owl), [Docling](https://github.com/docling-project/docling), [PPTAgent](https://github.com/icip-cas/PPTAgent) for providing their codebases.

	## 📖 Citation

	Please kindly cite our paper if you find this project helpful.

	```bibtex
	@misc{paper2poster,
	title={Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers},
	author={Wei Pang and Kevin Qinghong Lin and Xiangru Jian and Xi He and Philip Torr},
	year={2025},
	eprint={2505.21497},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2505.21497},
	}
	```