JayRay5
/

DIVE-Doc-FRD

@@ -12,10 +12,10 @@ spaces:
 DIVE-Doc is a VLM architecture built as a trade-off between end-to-end lightweight architectures and LVLMs for the DocVQA task.
 Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
 It takes an image document and a question as input and returns an answer. <br>
-- **Repository:** [GitHub](https://github.com/JayRay5/DIVE-Doc)
 - **Paper (Spotlight/Best Paper Award VisionDocs@ICCV2025):** <br>
   [DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA](https://openaccess.thecvf.com/content/ICCV2025W/VisionDocs/html/Bencharef_DIVE-Doc_Downscaling_foundational_Image_Visual_Encoder_into_hierarchical_architecture_for_ICCVW_2025_paper.html)
 ## 2 Model Summary
 DIVE-Doc is built as a trade-off between end-to-end lightweight architectures and LVLMs.
@@ -29,7 +29,21 @@ Trained on the [DocVQA dataset](https://openaccess.thecvf.com/content/WACV2021/h
 ## 3 Quick Start
-### Installation
 ```bash
 git clone https://github.com/JayRay5/DIVE-Doc.git
 cd DIVE-Doc
@@ -37,7 +51,7 @@ conda create -n dive-doc-env python=3.11.5
 conda activate dive-doc-env
 pip install -r requirements.txt
 ```
-### Inference example using the model repository and gradio
 In app.py, modify the path variable to "JayRay5/DIVE-Doc-FRD":
 ```bash
 if __name__ == "__main__":
@@ -49,6 +63,7 @@ Then run:
 python app.py
 ```
 This will start a [gradio](https://www.gradio.app/) web interface where you can use the model.
 ## Notification

 DIVE-Doc is a VLM architecture built as a trade-off between end-to-end lightweight architectures and LVLMs for the DocVQA task.
 Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
 It takes an image document and a question as input and returns an answer. <br>
 - **Paper (Spotlight/Best Paper Award VisionDocs@ICCV2025):** <br>
   [DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA](https://openaccess.thecvf.com/content/ICCV2025W/VisionDocs/html/Bencharef_DIVE-Doc_Downscaling_foundational_Image_Visual_Encoder_into_hierarchical_architecture_for_ICCVW_2025_paper.html)
+- **Repository:** [GitHub](https://github.com/JayRay5/DIVE-Doc)
+- **Demo:** [Space](https://huggingface.co/spaces/JayRay5/DIVE-Doc-docvqa) <br>
 ## 2 Model Summary
 DIVE-Doc is built as a trade-off between end-to-end lightweight architectures and LVLMs.
 ## 3 Quick Start
+### Direct Use
+#### From Hugging Face Space
+[Click here](https://huggingface.co/spaces/JayRay5/DIVE-Doc-docvqa)
+#### From the Transformers library
+```bash
+from transformers import AutoModelForCausalLM
+AutoModelForCausalLM.from_pretrained("JayRay5/DIVE-Doc-FRD",trust_remote_code=True)
+```
+### Use from the GitHub repository
+#### Installation
 ```bash
 git clone https://github.com/JayRay5/DIVE-Doc.git
 cd DIVE-Doc
 conda activate dive-doc-env
 pip install -r requirements.txt
 ```
+#### Inference example using the model repository and gradio
 In app.py, modify the path variable to "JayRay5/DIVE-Doc-FRD":
 ```bash
 if __name__ == "__main__":
 python app.py
 ```
 This will start a [gradio](https://www.gradio.app/) web interface where you can use the model.
 ## Notification