TongkunGuan commited on
Commit
7fd4955
·
verified ·
1 Parent(s): f20a11b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -71,12 +71,12 @@ In the following table, we provide all models [🤗 link] of the TokenOCR series
71
  | TokenOCR-4096-English | feature dimension is 4096; support interactive with English texts.|
72
  | TokenOCR-4096-Chinese | feature dimension is 4096; support interactive with Chinese texts. |
73
  | TokenOCR-2048-Bilingual | feature dimension is 4096; support interactive with English and Chinese texts. |
74
- | TokenOCR-4096-English-seg | On `TokenOCR-4096-English`, background noise is filtered out. |
75
 
76
  ### Quick Start
77
 
78
  > \[!Warning\]
79
- > 🚨 Note: In our experience, the InternViT V2.5 series is better suited for building MLLMs than traditional computer vision tasks.
80
 
81
  ```python
82
  import torch
@@ -101,6 +101,10 @@ outputs = model(pixel_values)
101
 
102
  ## TokenVL
103
 
 
 
 
 
104
 
105
  ## Model Architecture
106
 
 
71
  | TokenOCR-4096-English | feature dimension is 4096; support interactive with English texts.|
72
  | TokenOCR-4096-Chinese | feature dimension is 4096; support interactive with Chinese texts. |
73
  | TokenOCR-2048-Bilingual | feature dimension is 4096; support interactive with English and Chinese texts. |
74
+ | TokenOCR-4096-English-seg | On `TokenOCR-4096-English`, background noise is filtered out. You can use prompt ' ' to get a highlight background. |
75
 
76
  ### Quick Start
77
 
78
  > \[!Warning\]
79
+ > 🚨 Note: In our experience, the `TokenOCR-2048-Bilingual` series is better suited for building MLLMs than the `-seg` version.
80
 
81
  ```python
82
  import torch
 
101
 
102
  ## TokenVL
103
 
104
+ we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
105
+ Following the previous training paradigm, TokenVL also includes two stages:
106
+ **Stage 1: LLM-guided Token Alignment Training for text parsing tasks.**
107
+ **Stage 2: Supervised Instruction Tuning for VQA tasks.**
108
 
109
  ## Model Architecture
110