TongkunGuan
/

TokenFD

Model card Files Files and versions

TongkunGuan commited on Feb 21, 2025

Commit

7fd4955

·

verified ·

1 Parent(s): f20a11b

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -71,12 +71,12 @@ In the following table, we provide all models [🤗 link] of the TokenOCR series
 | TokenOCR-4096-English | feature dimension is 4096; support interactive with English texts.|
 |  TokenOCR-4096-Chinese  |  feature dimension is 4096; support interactive with Chinese texts.  |
 |  TokenOCR-2048-Bilingual  |  feature dimension is 4096; support interactive with English and Chinese texts. |
-| TokenOCR-4096-English-seg |  On `TokenOCR-4096-English`, background noise is filtered out. |
 ### Quick Start
 > \[!Warning\]
-> 🚨 Note: In our experience, the InternViT V2.5 series is better suited for building MLLMs than traditional computer vision tasks.
 ```python
 import torch
@@ -101,6 +101,10 @@ outputs = model(pixel_values)
 ## TokenVL
 ## Model Architecture

 | TokenOCR-4096-English | feature dimension is 4096; support interactive with English texts.|
 |  TokenOCR-4096-Chinese  |  feature dimension is 4096; support interactive with Chinese texts.  |
 |  TokenOCR-2048-Bilingual  |  feature dimension is 4096; support interactive with English and Chinese texts. |
+| TokenOCR-4096-English-seg |  On `TokenOCR-4096-English`, background noise is filtered out. You can use prompt ' ' to get a highlight background. |
 ### Quick Start
 > \[!Warning\]
+> 🚨 Note: In our experience, the `TokenOCR-2048-Bilingual` series is better suited for building MLLMs than the `-seg` version.
 ```python
 import torch
 ## TokenVL
+we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
+Following the previous training paradigm, TokenVL also includes two stages:
+**Stage 1: LLM-guided Token Alignment Training for text parsing tasks.**
+**Stage 2: Supervised Instruction Tuning for VQA tasks.**
 ## Model Architecture