JayRay5
/

DIVE-Doc-ARD-LRes

@@ -5,12 +5,6 @@ datasets:
 - lmms-lab/DocVQA
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## 1 Introduction
 DIVE-Doc is a VLM architecture built as a trade-off between end-to-end lightweight architectures and LVLMs for the DocVQA task.
 Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
@@ -31,22 +25,33 @@ Moreover, the model is finetuned using LoRA adapters (in this repo, adapters hav
 ## Quick Start
-<!-- Provide the basic links for the model. -->
-- **Repository:** [GitHub](https://github.com/JayRay5/DIVE-Doc)
-- **Paper [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 This model is designed to answer a question from a single-page image document.
-[More Information Needed]
-### Downstream Use [optional]
 This model can be finetuned on other DocVQA dataset such as [InfoGraphVQA]() to improve its performance on Infographic documents, or []() for a specialization on degraded or historical documents.
@@ -56,55 +61,8 @@ This model can be finetuned on other DocVQA dataset such as [InfoGraphVQA]() to
 This model may not perform well on degraded or infographic documents because of its finetuning on mostly industrial documents.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Implementation Details
-### Training Data
-This model has been trained using the [DocVQA dataset]().
-[More Information Needed]
-### Evaluation
-<!-- Met. -->
-#### Metrics
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Results
-<!-- docvqa performance -->
-<!-- inference time -->
-<!--
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed] -->
 ## Citation [optional]

 - lmms-lab/DocVQA
 ---
 ## 1 Introduction
 DIVE-Doc is a VLM architecture built as a trade-off between end-to-end lightweight architectures and LVLMs for the DocVQA task.
 Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
 ## Quick Start
+### Installation
+```bash
+git clone https://github.com/JayRay5/DIVE-Doc.git
+cd DIVE-Doc
+conda create -n dive-doc-env python=3.11.5
+conda activate dive-doc-env
+pip install -r requirements.txt
+```
+### Inference example using the model repository and gradio
+In app.py, modify the path variable by "JayRay5/DIVE-Doc-ARD-LRes":
+```bash
+```
+Then run:
+```bash
+python app.py
+```
+This will start a [Gradio]() web interface where you can use the model.
+## Notification
 ### Direct Use
 This model is designed to answer a question from a single-page image document.
+### Downstream Use
 This model can be finetuned on other DocVQA dataset such as [InfoGraphVQA]() to improve its performance on Infographic documents, or []() for a specialization on degraded or historical documents.
 This model may not perform well on degraded or infographic documents because of its finetuning on mostly industrial documents.
 ## Citation [optional]