Transformers
Safetensors
DIVEdoc
docvqa
distillation
VLM
document-understanding
OCR-free
JayRay5 commited on
Commit
b253a08
·
verified ·
1 Parent(s): d0172a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
4
  ---
5
 
6
  # Model Card for Model ID
@@ -9,8 +11,10 @@ tags: []
9
 
10
 
11
 
12
- ## Model Details
13
-
 
 
14
  ### Model Description
15
 
16
  <!-- Provide a longer summary of what this model is. -->
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
+ datasets:
5
+ - lmms-lab/DocVQA
6
  ---
7
 
8
  # Model Card for Model ID
 
11
 
12
 
13
 
14
+ ## Model Description
15
+ DIVE-Doc is an end-to-end trade-off between LVLMs and lightweight architectures in the context of DocVQA.
16
+ It is built by distilling the [SigLIP-400m](https://arxiv.org/abs/2303.15343) visual encoder of [PaliGEMMA]() into a small hierarchical [Swin transformer](), while reusing the original [GEMMA]() decoder.
17
+ This allowed DIVE-Doc to keep competitive performance with its teacher while reducing the visual encoder's parameters to one-fifth.
18
  ### Model Description
19
 
20
  <!-- Provide a longer summary of what this model is. -->