TongkunGuan
/

TokenFD

Model card Files Files and versions

TongkunGuan commited on Feb 21, 2025

Commit

fc92c4b

·

verified ·

1 Parent(s): 7956f56

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -18,8 +18,12 @@ base_model_relation: finetune
   <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
 </div>
 # Introduction
 We are excited to announce the release of **`TokenOCR`**, the first token-level visual foundation model specifically tailored for text-image-related tasks,
 designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
 we also devise a high-quality data production pipeline that constructs the first token-level image text dataset,
@@ -27,8 +31,12 @@ we also devise a high-quality data production pipeline that constructs the first
 Furthermore, leveraging this foundation with exceptional image-as-text capability,
 we seamlessly replace previous VFMs with TokenOCR to construct a document-level MLLM, **`TokenVL`**, for VQA-based document understanding tasks.
 # Token Family
 <!-- ## TokenIT -->
 <h2 style="color: #4CAF50;">TokenIT</h2>
@@ -146,9 +154,12 @@ Please refer to our technical report for more details.
 <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/IbLZ0CxCxDkTaHAMe7M0Q.png)
  -->
 <!-- ## TokenVL -->
 <h2 style="color: #4CAF50;">TokenVL</h2>
 we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
 Following the previous training paradigm, TokenVL also includes two stages:

   <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
 </div>
+<center>
 # Introduction
+</center>
 We are excited to announce the release of **`TokenOCR`**, the first token-level visual foundation model specifically tailored for text-image-related tasks,
 designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
 we also devise a high-quality data production pipeline that constructs the first token-level image text dataset,
 Furthermore, leveraging this foundation with exceptional image-as-text capability,
 we seamlessly replace previous VFMs with TokenOCR to construct a document-level MLLM, **`TokenVL`**, for VQA-based document understanding tasks.
+<center>
 # Token Family
+</center>
 <!-- ## TokenIT -->
 <h2 style="color: #4CAF50;">TokenIT</h2>
 <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/IbLZ0CxCxDkTaHAMe7M0Q.png)
  -->
+<center>
 <!-- ## TokenVL -->
 <h2 style="color: #4CAF50;">TokenVL</h2>
+</center>
 we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
 Following the previous training paradigm, TokenVL also includes two stages: