ByteDance
/

Dolphin

Image-Text-to-Text

vision-encoder-decoder

document-parsing

document-understanding

document-intelligence

layout-analysis

table-extraction

vision-language-model

Model card Files Files and versions

HaoFeng2025 commited on May 20, 2025

Commit

494a470

·

verified ·

1 Parent(s): c7a35db

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -23,11 +23,11 @@ library_name: transformers
 <a href="https://github.com/bytedance/Dolphin"><img src="https://img.shields.io/badge/Code-Github-blue"></a>
 <div align="center">
   <img src="https://cdn.wandeer.world/null/dolphin_demo.gif" width="800">
 </div>
 ## Model Description
@@ -40,9 +40,9 @@ Document image parsing is challenging due to its complexly intertwined elements
 1. **🔍 Stage 1**: Comprehensive page-level layout analysis by generating element sequence in natural reading order
 2. **🧩 Stage 2**: Efficient parallel parsing of document elements using heterogeneous anchors and task-specific prompts
-<div align="center">
   <img src="https://cdn.wandeer.world/null/dolphin_framework.png" width="680">
-</div>
 Dolphin achieves promising performance across diverse page-level and element-level parsing tasks while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism.

 <a href="https://github.com/bytedance/Dolphin"><img src="https://img.shields.io/badge/Code-Github-blue"></a>
+<!--
 <div align="center">
   <img src="https://cdn.wandeer.world/null/dolphin_demo.gif" width="800">
 </div>
+ -->
 ## Model Description
 1. **🔍 Stage 1**: Comprehensive page-level layout analysis by generating element sequence in natural reading order
 2. **🧩 Stage 2**: Efficient parallel parsing of document elements using heterogeneous anchors and task-specific prompts
+<!-- <div align="center">
   <img src="https://cdn.wandeer.world/null/dolphin_framework.png" width="680">
+</div> -->
 Dolphin achieves promising performance across diverse page-level and element-level parsing tasks while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism.