Update README.md
Browse files
README.md
CHANGED
|
@@ -71,12 +71,12 @@ In the following table, we provide all models [🤗 link] of the TokenOCR series
|
|
| 71 |
| TokenOCR-4096-English | feature dimension is 4096; support interactive with English texts.|
|
| 72 |
| TokenOCR-4096-Chinese | feature dimension is 4096; support interactive with Chinese texts. |
|
| 73 |
| TokenOCR-2048-Bilingual | feature dimension is 4096; support interactive with English and Chinese texts. |
|
| 74 |
-
| TokenOCR-4096-English-seg | On `TokenOCR-4096-English`, background noise is filtered out. |
|
| 75 |
|
| 76 |
### Quick Start
|
| 77 |
|
| 78 |
> \[!Warning\]
|
| 79 |
-
> 🚨 Note: In our experience, the
|
| 80 |
|
| 81 |
```python
|
| 82 |
import torch
|
|
@@ -101,6 +101,10 @@ outputs = model(pixel_values)
|
|
| 101 |
|
| 102 |
## TokenVL
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
## Model Architecture
|
| 106 |
|
|
|
|
| 71 |
| TokenOCR-4096-English | feature dimension is 4096; support interactive with English texts.|
|
| 72 |
| TokenOCR-4096-Chinese | feature dimension is 4096; support interactive with Chinese texts. |
|
| 73 |
| TokenOCR-2048-Bilingual | feature dimension is 4096; support interactive with English and Chinese texts. |
|
| 74 |
+
| TokenOCR-4096-English-seg | On `TokenOCR-4096-English`, background noise is filtered out. You can use prompt ' ' to get a highlight background. |
|
| 75 |
|
| 76 |
### Quick Start
|
| 77 |
|
| 78 |
> \[!Warning\]
|
| 79 |
+
> 🚨 Note: In our experience, the `TokenOCR-2048-Bilingual` series is better suited for building MLLMs than the `-seg` version.
|
| 80 |
|
| 81 |
```python
|
| 82 |
import torch
|
|
|
|
| 101 |
|
| 102 |
## TokenVL
|
| 103 |
|
| 104 |
+
we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
|
| 105 |
+
Following the previous training paradigm, TokenVL also includes two stages:
|
| 106 |
+
**Stage 1: LLM-guided Token Alignment Training for text parsing tasks.**
|
| 107 |
+
**Stage 2: Supervised Instruction Tuning for VQA tasks.**
|
| 108 |
|
| 109 |
## Model Architecture
|
| 110 |
|