Update README.md
Browse files
README.md
CHANGED
|
@@ -20,13 +20,19 @@ we also devise a high-quality data production pipeline that constructs the first
|
|
| 20 |
Furthermore, leveraging this foundation with exceptional image-as-text capability,
|
| 21 |
we seamlessly replace previous VFMs with TokenOCR to construct a document-level MLLM, `TokenVL`, for VQA-based document understanding tasks.
|
| 22 |
|
| 23 |
-

|
| 24 |
-
|
| 25 |
# Token Family
|
| 26 |
|
| 27 |
## TokenIT
|
| 28 |
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
|
| 32 |
| VFM | Granularity | Dataset | #Image | #Pairs |
|
|
|
|
| 20 |
Furthermore, leveraging this foundation with exceptional image-as-text capability,
|
| 21 |
we seamlessly replace previous VFMs with TokenOCR to construct a document-level MLLM, `TokenVL`, for VQA-based document understanding tasks.
|
| 22 |
|
|
|
|
|
|
|
| 23 |
# Token Family
|
| 24 |
|
| 25 |
## TokenIT
|
| 26 |
|
| 27 |
+
<div align="center">
|
| 28 |
+
<img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/WcQwU3-xjyT5Vm-pZhACo.png">
|
| 29 |
+
</div>
|
| 30 |
+
|
| 31 |
+
<!--  -->
|
| 32 |
+
n overview of the self-constructed token-level TokenIT dataset, comprising 20 million images and 1.8 billion
|
| 33 |
+
text-mask pairs. (a) provides a detailed description of each sample, including the raw image, a mask, and a JSON file that
|
| 34 |
+
records BPE token information. We also count (b) the data distribution, (c) the number of selected BPE tokens, and (d) a
|
| 35 |
+
word cloud map highlighting the top 100 BPE tokens.
|
| 36 |
|
| 37 |
|
| 38 |
| VFM | Granularity | Dataset | #Image | #Pairs |
|