DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation
Abstract
DeepPresenter is an agentic framework for presentation generation that adaptively plans and refines slide artifacts through environment-grounded reflection, achieving state-of-the-art performance with reduced computational costs.
Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic framework that adapts to diverse user intents, enables effective feedback-driven refinement, and generalizes beyond a scripted pipeline. Specifically, DeepPresenter autonomously plans, renders, and revises intermediate slide artifacts to support long-horizon refinement with environmental observations. Furthermore, rather than relying on self-reflection over internal signals (e.g., reasoning traces), our environment-grounded reflection conditions the generation process on perceptual artifact states (e.g., rendered slides), enabling the system to identify and correct presentation-specific issues during execution. Results on the evaluation set covering diverse presentation-generation scenarios show that DeepPresenter achieves state-of-the-art performance, and the fine-tuned 9B model remains highly competitive at substantially lower cost. Our project is available at: https://github.com/icip-cas/PPTAgent
Community
Open Sourced at: https://huggingface.co/collections/ICIP/deeppresenter
Project: https://github.com/icip-cas/PPTAgent
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Code2World: A GUI World Model via Renderable Code Generation (2026)
- TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents (2026)
- SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL (2026)
- Code2Worlds: Empowering Coding LLMs for 4D World Generation (2026)
- MUSE: A Multi-agent Framework for Unconstrained Story Envisioning via Closed-Loop Cognitive Orchestration (2026)
- ANCHOR: Branch-Point Data Generation for GUI Agents (2026)
- RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 2
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper