TongkunGuan
/

TokenFD

Model card Files Files and versions

TongkunGuan commited on Feb 21, 2025

Commit

47e6ad9

·

verified ·

1 Parent(s): eb1624a

Update README.md

Files changed (1) hide show

README.md +13 -4

README.md CHANGED Viewed

@@ -5,15 +5,13 @@ base_model: TokenOCR
 base_model_relation: finetune
 ---
-# TokenOCR
 [\[📂 GitHub\]](https://github.com/Token-family/TokenOCR)    [\[📖 Paper\]]() [\[🆕 Blog\]]()    [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)    [\[🚀 Quick Start\]](#quick-start)
 <div align="center">
   <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
 </div>
-## Introduction
 We are excited to announce the release of `TokenOCR`, the first token-level visual foundation model specifically tailored for text-image-related tasks,
 designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
@@ -24,10 +22,21 @@ we seamlessly replace previous VFMs with TokenOCR to construct a document-level
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/o9_FX5D8_NOS1gfnebp5s.png)
-## Token Family
 ## TokenIT
 ## TokenOCR
 In the following table, we provide an overview of the InternViT 2.5 series.

 base_model_relation: finetune
 ---
 [\[📂 GitHub\]](https://github.com/Token-family/TokenOCR)    [\[📖 Paper\]]() [\[🆕 Blog\]]()    [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)    [\[🚀 Quick Start\]](#quick-start)
 <div align="center">
   <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
 </div>
+# Introduction
 We are excited to announce the release of `TokenOCR`, the first token-level visual foundation model specifically tailored for text-image-related tasks,
 designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/o9_FX5D8_NOS1gfnebp5s.png)
+# Token Family
 ## TokenIT
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/WcQwU3-xjyT5Vm-pZhACo.png)
+| VFM                | Granularity | Dataset  | #Image | #Pairs |
+|:-------------------|:------------|:---------|:------:|:------:|
+| [CLIP](https://github.com/openai/CLIP) | image-level | WIT400M  | 400M   | 0.4B   |
+| [DINO](https://github.com/facebookresearch/dino) | image-level | ImageNet | 14M    | -      |
+| [SAM](https://github.com/facebookresearch/SAM)  | pixel-level | SA1B     | 11M    | 1.1B   |
+| **TokenOCR**           | token-level | **TokenIT**  | **20M**    | **1.8B**   |
 ## TokenOCR
 In the following table, we provide an overview of the InternViT 2.5 series.