Text Generation
Transformers
Safetensors
DIVEdoc
docvqa
distillation
VLM
document-understanding
OCR-free
custom_code
JayRay5 commited on
Commit
6a47945
·
verified ·
1 Parent(s): 9bedf80

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -4
README.md CHANGED
@@ -12,10 +12,10 @@ spaces:
12
  DIVE-Doc is a VLM architecture built as a trade-off between end-to-end lightweight architectures and LVLMs for the DocVQA task.
13
  Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
14
  It takes an image document and a question as input and returns an answer. <br>
15
- - **Repository:** [GitHub](https://github.com/JayRay5/DIVE-Doc)
16
  - **Paper (Spotlight/Best Paper Award VisionDocs@ICCV2025):** <br>
17
  [DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA](https://openaccess.thecvf.com/content/ICCV2025W/VisionDocs/html/Bencharef_DIVE-Doc_Downscaling_foundational_Image_Visual_Encoder_into_hierarchical_architecture_for_ICCVW_2025_paper.html)
18
-
 
19
 
20
  ## 2 Model Summary
21
  DIVE-Doc is built as a trade-off between end-to-end lightweight architectures and LVLMs.
@@ -29,7 +29,21 @@ Trained on the [DocVQA dataset](https://openaccess.thecvf.com/content/WACV2021/h
29
 
30
  ## 3 Quick Start
31
 
32
- ### Installation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ```bash
34
  git clone https://github.com/JayRay5/DIVE-Doc.git
35
  cd DIVE-Doc
@@ -37,7 +51,7 @@ conda create -n dive-doc-env python=3.11.5
37
  conda activate dive-doc-env
38
  pip install -r requirements.txt
39
  ```
40
- ### Inference example using the model repository and gradio
41
  In app.py, modify the path variable to "JayRay5/DIVE-Doc-FRD":
42
  ```bash
43
  if __name__ == "__main__":
@@ -49,6 +63,7 @@ Then run:
49
  python app.py
50
  ```
51
  This will start a [gradio](https://www.gradio.app/) web interface where you can use the model.
 
52
  ## Notification
53
 
54
 
 
12
  DIVE-Doc is a VLM architecture built as a trade-off between end-to-end lightweight architectures and LVLMs for the DocVQA task.
13
  Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
14
  It takes an image document and a question as input and returns an answer. <br>
 
15
  - **Paper (Spotlight/Best Paper Award VisionDocs@ICCV2025):** <br>
16
  [DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA](https://openaccess.thecvf.com/content/ICCV2025W/VisionDocs/html/Bencharef_DIVE-Doc_Downscaling_foundational_Image_Visual_Encoder_into_hierarchical_architecture_for_ICCVW_2025_paper.html)
17
+ - **Repository:** [GitHub](https://github.com/JayRay5/DIVE-Doc)
18
+ - **Demo:** [Space](https://huggingface.co/spaces/JayRay5/DIVE-Doc-docvqa) <br>
19
 
20
  ## 2 Model Summary
21
  DIVE-Doc is built as a trade-off between end-to-end lightweight architectures and LVLMs.
 
29
 
30
  ## 3 Quick Start
31
 
32
+ ### Direct Use
33
+
34
+ #### From Hugging Face Space
35
+
36
+ [Click here](https://huggingface.co/spaces/JayRay5/DIVE-Doc-docvqa)
37
+
38
+ #### From the Transformers library
39
+ ```bash
40
+ from transformers import AutoModelForCausalLM
41
+
42
+ AutoModelForCausalLM.from_pretrained("JayRay5/DIVE-Doc-FRD",trust_remote_code=True)
43
+ ```
44
+
45
+ ### Use from the GitHub repository
46
+ #### Installation
47
  ```bash
48
  git clone https://github.com/JayRay5/DIVE-Doc.git
49
  cd DIVE-Doc
 
51
  conda activate dive-doc-env
52
  pip install -r requirements.txt
53
  ```
54
+ #### Inference example using the model repository and gradio
55
  In app.py, modify the path variable to "JayRay5/DIVE-Doc-FRD":
56
  ```bash
57
  if __name__ == "__main__":
 
63
  python app.py
64
  ```
65
  This will start a [gradio](https://www.gradio.app/) web interface where you can use the model.
66
+
67
  ## Notification
68
 
69