Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ Where the first category has both a lightweight visual encoder and a language de
|
|
| 19 |
DIVE-Doc contains a small visual encoder in combination with a large decoder in order to balance model size and performance.
|
| 20 |
It is built by distilling the [SigLIP-400m](https://arxiv.org/abs/2303.15343) visual encoder of [PaliGEMMA]() into a small hierarchical [Swin transformer]() initialized with the weights of [Donut](), while reusing the original [GEMMA]() decoder.
|
| 21 |
This allows DIVE-Doc to keep competitive performance with its teacher while reducing the visual encoder's parameters to one-fifth.
|
| 22 |
-
Moreover, the model is finetuned using LoRA adapters
|
| 23 |
|
| 24 |
|
| 25 |
|
|
|
|
| 19 |
DIVE-Doc contains a small visual encoder in combination with a large decoder in order to balance model size and performance.
|
| 20 |
It is built by distilling the [SigLIP-400m](https://arxiv.org/abs/2303.15343) visual encoder of [PaliGEMMA]() into a small hierarchical [Swin transformer]() initialized with the weights of [Donut](), while reusing the original [GEMMA]() decoder.
|
| 21 |
This allows DIVE-Doc to keep competitive performance with its teacher while reducing the visual encoder's parameters to one-fifth.
|
| 22 |
+
Moreover, the model is finetuned using LoRA adapters, which have been merged into the base model using [unload_and_merge]()).
|
| 23 |
|
| 24 |
|
| 25 |
|