TongkunGuan commited on
Commit
47e6ad9
Β·
verified Β·
1 Parent(s): eb1624a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -4
README.md CHANGED
@@ -5,15 +5,13 @@ base_model: TokenOCR
5
  base_model_relation: finetune
6
  ---
7
 
8
- # TokenOCR
9
-
10
  [\[πŸ“‚ GitHub\]](https://github.com/Token-family/TokenOCR) [\[πŸ“– Paper\]]() [\[πŸ†• Blog\]]() [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start)
11
 
12
  <div align="center">
13
  <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
14
  </div>
15
 
16
- ## Introduction
17
 
18
  We are excited to announce the release of `TokenOCR`, the first token-level visual foundation model specifically tailored for text-image-related tasks,
19
  designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
@@ -24,10 +22,21 @@ we seamlessly replace previous VFMs with TokenOCR to construct a document-level
24
 
25
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/o9_FX5D8_NOS1gfnebp5s.png)
26
 
27
- ## Token Family
28
 
29
  ## TokenIT
30
 
 
 
 
 
 
 
 
 
 
 
 
31
  ## TokenOCR
32
 
33
  In the following table, we provide an overview of the InternViT 2.5 series.
 
5
  base_model_relation: finetune
6
  ---
7
 
 
 
8
  [\[πŸ“‚ GitHub\]](https://github.com/Token-family/TokenOCR) [\[πŸ“– Paper\]]() [\[πŸ†• Blog\]]() [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start)
9
 
10
  <div align="center">
11
  <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
12
  </div>
13
 
14
+ # Introduction
15
 
16
  We are excited to announce the release of `TokenOCR`, the first token-level visual foundation model specifically tailored for text-image-related tasks,
17
  designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
 
22
 
23
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/o9_FX5D8_NOS1gfnebp5s.png)
24
 
25
+ # Token Family
26
 
27
  ## TokenIT
28
 
29
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/WcQwU3-xjyT5Vm-pZhACo.png)
30
+
31
+
32
+ | VFM | Granularity | Dataset | #Image | #Pairs |
33
+ |:-------------------|:------------|:---------|:------:|:------:|
34
+ | [CLIP](https://github.com/openai/CLIP) | image-level | WIT400M | 400M | 0.4B |
35
+ | [DINO](https://github.com/facebookresearch/dino) | image-level | ImageNet | 14M | - |
36
+ | [SAM](https://github.com/facebookresearch/SAM) | pixel-level | SA1B | 11M | 1.1B |
37
+ | **TokenOCR** | token-level | **TokenIT** | **20M** | **1.8B** |
38
+
39
+
40
  ## TokenOCR
41
 
42
  In the following table, we provide an overview of the InternViT 2.5 series.