JayRay5
/

DIVE-Doc-ARD-LRes

document-understanding

Model card Files Files and versions

JayRay5 commited on Jul 16, 2025

Commit

b253a08

·

verified ·

1 Parent(s): d0172a2

Update README.md

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
@@ -9,8 +11,10 @@ tags: []
-## Model Details
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->

 ---
 library_name: transformers
+license: mit
+datasets:
+- lmms-lab/DocVQA
 ---
 # Model Card for Model ID
+## Model Description
+DIVE-Doc is an end-to-end trade-off between LVLMs and lightweight architectures in the context of DocVQA.
+It is built by distilling the [SigLIP-400m](https://arxiv.org/abs/2303.15343) visual encoder of [PaliGEMMA]() into a small hierarchical [Swin transformer](), while reusing the original [GEMMA]() decoder.
+This allowed DIVE-Doc to keep competitive performance with its teacher while reducing the visual encoder's parameters to one-fifth.
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->