TongkunGuan commited on
Commit
fc92c4b
·
verified ·
1 Parent(s): 7956f56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -18,8 +18,12 @@ base_model_relation: finetune
18
  <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
19
  </div>
20
 
 
 
21
  # Introduction
22
 
 
 
23
  We are excited to announce the release of **`TokenOCR`**, the first token-level visual foundation model specifically tailored for text-image-related tasks,
24
  designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
25
  we also devise a high-quality data production pipeline that constructs the first token-level image text dataset,
@@ -27,8 +31,12 @@ we also devise a high-quality data production pipeline that constructs the first
27
  Furthermore, leveraging this foundation with exceptional image-as-text capability,
28
  we seamlessly replace previous VFMs with TokenOCR to construct a document-level MLLM, **`TokenVL`**, for VQA-based document understanding tasks.
29
 
 
 
30
  # Token Family
31
 
 
 
32
  <!-- ## TokenIT -->
33
  <h2 style="color: #4CAF50;">TokenIT</h2>
34
 
@@ -146,9 +154,12 @@ Please refer to our technical report for more details.
146
 
147
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/IbLZ0CxCxDkTaHAMe7M0Q.png)
148
  -->
 
 
149
  <!-- ## TokenVL -->
150
  <h2 style="color: #4CAF50;">TokenVL</h2>
151
 
 
152
  we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
153
  Following the previous training paradigm, TokenVL also includes two stages:
154
 
 
18
  <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
19
  </div>
20
 
21
+ <center>
22
+
23
  # Introduction
24
 
25
+ </center>
26
+
27
  We are excited to announce the release of **`TokenOCR`**, the first token-level visual foundation model specifically tailored for text-image-related tasks,
28
  designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
29
  we also devise a high-quality data production pipeline that constructs the first token-level image text dataset,
 
31
  Furthermore, leveraging this foundation with exceptional image-as-text capability,
32
  we seamlessly replace previous VFMs with TokenOCR to construct a document-level MLLM, **`TokenVL`**, for VQA-based document understanding tasks.
33
 
34
+ <center>
35
+
36
  # Token Family
37
 
38
+ </center>
39
+
40
  <!-- ## TokenIT -->
41
  <h2 style="color: #4CAF50;">TokenIT</h2>
42
 
 
154
 
155
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/IbLZ0CxCxDkTaHAMe7M0Q.png)
156
  -->
157
+
158
+ <center>
159
  <!-- ## TokenVL -->
160
  <h2 style="color: #4CAF50;">TokenVL</h2>
161
 
162
+ </center>
163
  we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
164
  Following the previous training paradigm, TokenVL also includes two stages:
165