JayRay5
/

DIVE-Doc-ARD-LRes

document-understanding

Model card Files Files and versions

JayRay5 commited on Jul 17, 2025

Commit

6519e2d

·

verified ·

1 Parent(s): 6d6d406

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ Where the first category has both a lightweight visual encoder and a language de
 DIVE-Doc contains a small visual encoder in combination with a large decoder in order to balance model size and performance.
 It is built by distilling the [SigLIP-400m](https://arxiv.org/abs/2303.15343) visual encoder of [PaliGEMMA]() into a small hierarchical [Swin transformer]() initialized with the weights of [Donut](), while reusing the original [GEMMA]() decoder.
 This allows DIVE-Doc to keep competitive performance with its teacher while reducing the visual encoder's parameters to one-fifth.
-Moreover, the model is finetuned using LoRA adapters (in this repo, adapters have been merged into the base model using [unload_and_merge]()).

 DIVE-Doc contains a small visual encoder in combination with a large decoder in order to balance model size and performance.
 It is built by distilling the [SigLIP-400m](https://arxiv.org/abs/2303.15343) visual encoder of [PaliGEMMA]() into a small hierarchical [Swin transformer]() initialized with the weights of [Donut](), while reusing the original [GEMMA]() decoder.
 This allows DIVE-Doc to keep competitive performance with its teacher while reducing the visual encoder's parameters to one-fifth.
+Moreover, the model is finetuned using LoRA adapters, which have been merged into the base model using [unload_and_merge]()).