Papers
arxiv:2506.18023

PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding

Published on Jun 22
Authors:
,
,
,
,
,

Abstract

PP-DocBee2, an advanced multimodal document understanding model, improves performance and reduces inference latency through enhanced synthetic data quality, improved visual feature fusion, and optimized inference methods.

AI-generated summary

This report introduces PP-DocBee2, an advanced version of the PP-DocBee, designed to enhance multimodal document understanding. Built on a large multimodal model architecture, PP-DocBee2 addresses the limitations of its predecessor through key technological improvements, including enhanced synthetic data quality, improved visual feature fusion strategy, and optimized inference methodologies. These enhancements yield an 11.4% performance boost on internal benchmarks for Chinese business documents, and reduce inference latency by 73.0% to the vanilla version. A key innovation of our work is a data quality optimization strategy for multimodal document tasks. By employing a large-scale multimodal pre-trained model to evaluate data, we apply a novel statistical criterion to filter outliers, ensuring high-quality training data. Inspired by insights into underutilized intermediate features in multimodal models, we enhance the ViT representational capacity by decomposing it into layers and applying a novel feature fusion strategy to improve complex reasoning. The source code and pre-trained model are available at https://github.com/PaddlePaddle/PaddleMIX{https://github.com/PaddlePaddle/PaddleMIX}.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.18023 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.18023 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.18023 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.