Update README.md
Browse files
README.md
CHANGED
|
@@ -46,7 +46,6 @@ Dolphin-v2 follows a document-type-aware two-stage paradigm:
|
|
| 46 |
### Stage 1: Joint Classification and Layout Analysis
|
| 47 |
- **Document Type Classification**: Distinguishes between digital-born and photographed documents
|
| 48 |
- **Layout Analysis**: Generates element sequences in reading order with 21 supported categories
|
| 49 |
-
- **Precise Localization**: Absolute coordinate system for pixel-level accuracy
|
| 50 |
|
| 51 |
### Stage 2: Hybrid Content Parsing
|
| 52 |
- **Photographed Documents**: Holistic page-level parsing to handle distortions
|
|
@@ -63,7 +62,6 @@ Built on **Qwen2.5-VL-3B** backbone with:
|
|
| 63 |
## π Performance
|
| 64 |
|
| 65 |
Dolphin-v2 achieves superior performance on comprehensive benchmarks:
|
| 66 |
-
|
| 67 |
**OmniDocBench (v1.5):**
|
| 68 |
- Overall Score: **89.45** (+14.78 over original Dolphin)
|
| 69 |
- Text Recognition: **0.054** Edit Distance
|
|
@@ -73,7 +71,6 @@ Dolphin-v2 achieves superior performance on comprehensive benchmarks:
|
|
| 73 |
|
| 74 |
|
| 75 |
## π― Supported Element Types
|
| 76 |
-
|
| 77 |
Dolphin-v2 supports 21 document element categories:
|
| 78 |
|
| 79 |
| Element Type | Description |
|
|
@@ -94,39 +91,6 @@ Dolphin-v2 supports 21 document element categories:
|
|
| 94 |
| `watermark` | Watermarks |
|
| 95 |
| `anno` | Annotations |
|
| 96 |
|
| 97 |
-
## π» Usage
|
| 98 |
-
Please refer to our [GitHub repository](https://github.com/bytedance/Dolphin) for detailed usage:
|
| 99 |
-
- Page-wise parsing for complete document images
|
| 100 |
-
- Element-wise parsing for specific regions
|
| 101 |
-
- Examples for digital and photographed documents
|
| 102 |
-
|
| 103 |
-
## π§ Training Details
|
| 104 |
-
|
| 105 |
-
- **Backbone**: Qwen2.5-VL-3B
|
| 106 |
-
- **Training Data**:
|
| 107 |
-
- 200K photographed documents with realistic distortions
|
| 108 |
-
- 200K code images (C++, Python, Go, JavaScript)
|
| 109 |
-
- 200K catalog images with hierarchical structures
|
| 110 |
-
- **Optimizer**: AdamW (lr=8e-5, weight decay=0)
|
| 111 |
-
- **Training**: 10 epochs on 40 A100 GPUs
|
| 112 |
-
- **Max Sequence Length**: 131,072 tokens
|
| 113 |
-
|
| 114 |
-
## π Benchmarks
|
| 115 |
-
|
| 116 |
-
We evaluate on two complementary benchmarks:
|
| 117 |
-
- **OmniDocBench**: Diverse document types (academic papers, textbooks, slides, reports)
|
| 118 |
-
- **RealDoc-160**: Real-world photographed documents with authentic distortions
|
| 119 |
-
|
| 120 |
-
## π Key Features
|
| 121 |
-
|
| 122 |
-
β
Handles both digital and photographed documents seamlessly
|
| 123 |
-
β
21 element categories with fine-grained detection
|
| 124 |
-
β
Precise LaTeX formula recognition
|
| 125 |
-
β
Code block parsing with indentation preservation
|
| 126 |
-
β
Robust to distortions, lighting variations, and perspective changes
|
| 127 |
-
β
Efficient parallel processing for digital documents
|
| 128 |
-
β
Lightweight 3B parameter model
|
| 129 |
-
|
| 130 |
|
| 131 |
## π Citation
|
| 132 |
```bibtex
|
|
|
|
| 46 |
### Stage 1: Joint Classification and Layout Analysis
|
| 47 |
- **Document Type Classification**: Distinguishes between digital-born and photographed documents
|
| 48 |
- **Layout Analysis**: Generates element sequences in reading order with 21 supported categories
|
|
|
|
| 49 |
|
| 50 |
### Stage 2: Hybrid Content Parsing
|
| 51 |
- **Photographed Documents**: Holistic page-level parsing to handle distortions
|
|
|
|
| 62 |
## π Performance
|
| 63 |
|
| 64 |
Dolphin-v2 achieves superior performance on comprehensive benchmarks:
|
|
|
|
| 65 |
**OmniDocBench (v1.5):**
|
| 66 |
- Overall Score: **89.45** (+14.78 over original Dolphin)
|
| 67 |
- Text Recognition: **0.054** Edit Distance
|
|
|
|
| 71 |
|
| 72 |
|
| 73 |
## π― Supported Element Types
|
|
|
|
| 74 |
Dolphin-v2 supports 21 document element categories:
|
| 75 |
|
| 76 |
| Element Type | Description |
|
|
|
|
| 91 |
| `watermark` | Watermarks |
|
| 92 |
| `anno` | Annotations |
|
| 93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
## π Citation
|
| 96 |
```bibtex
|