Update README.md
Browse files
README.md
CHANGED
|
@@ -5,15 +5,13 @@ base_model: TokenOCR
|
|
| 5 |
base_model_relation: finetune
|
| 6 |
---
|
| 7 |
|
| 8 |
-
# TokenOCR
|
| 9 |
-
|
| 10 |
[\[π GitHub\]](https://github.com/Token-family/TokenOCR) [\[π Paper\]]() [\[π Blog\]]() [\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#quick-start)
|
| 11 |
|
| 12 |
<div align="center">
|
| 13 |
<img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
|
| 14 |
</div>
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
We are excited to announce the release of `TokenOCR`, the first token-level visual foundation model specifically tailored for text-image-related tasks,
|
| 19 |
designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
|
|
@@ -24,10 +22,21 @@ we seamlessly replace previous VFMs with TokenOCR to construct a document-level
|
|
| 24 |
|
| 25 |

|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
## TokenIT
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
## TokenOCR
|
| 32 |
|
| 33 |
In the following table, we provide an overview of the InternViT 2.5 series.
|
|
|
|
| 5 |
base_model_relation: finetune
|
| 6 |
---
|
| 7 |
|
|
|
|
|
|
|
| 8 |
[\[π GitHub\]](https://github.com/Token-family/TokenOCR) [\[π Paper\]]() [\[π Blog\]]() [\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#quick-start)
|
| 9 |
|
| 10 |
<div align="center">
|
| 11 |
<img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
|
| 12 |
</div>
|
| 13 |
|
| 14 |
+
# Introduction
|
| 15 |
|
| 16 |
We are excited to announce the release of `TokenOCR`, the first token-level visual foundation model specifically tailored for text-image-related tasks,
|
| 17 |
designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
|
|
|
|
| 22 |
|
| 23 |

|
| 24 |
|
| 25 |
+
# Token Family
|
| 26 |
|
| 27 |
## TokenIT
|
| 28 |
|
| 29 |
+

|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
| VFM | Granularity | Dataset | #Image | #Pairs |
|
| 33 |
+
|:-------------------|:------------|:---------|:------:|:------:|
|
| 34 |
+
| [CLIP](https://github.com/openai/CLIP) | image-level | WIT400M | 400M | 0.4B |
|
| 35 |
+
| [DINO](https://github.com/facebookresearch/dino) | image-level | ImageNet | 14M | - |
|
| 36 |
+
| [SAM](https://github.com/facebookresearch/SAM) | pixel-level | SA1B | 11M | 1.1B |
|
| 37 |
+
| **TokenOCR** | token-level | **TokenIT** | **20M** | **1.8B** |
|
| 38 |
+
|
| 39 |
+
|
| 40 |
## TokenOCR
|
| 41 |
|
| 42 |
In the following table, we provide an overview of the InternViT 2.5 series.
|