JayRay5
/

DIVE-Doc-FRD

Text Generation

document-understanding

Model card Files Files and versions

JayRay5 commited on Jan 7

Commit

cb71bf4

·

verified ·

1 Parent(s): a5223ff

Update README.md

Files changed (1) hide show

README.md +29 -2

README.md CHANGED Viewed

@@ -6,6 +6,12 @@ datasets:
 - pixparse/docvqa-single-page-questions
 spaces:
 - JayRay5/DIVE-Doc-docvqa
 ---
 ## 1 Introduction
@@ -37,9 +43,30 @@ Trained on the [DocVQA dataset](https://openaccess.thecvf.com/content/WACV2021/h
 #### From the Transformers library
 ```bash
-from transformers import AutoModelForCausalLM
-AutoModelForCausalLM.from_pretrained("JayRay5/DIVE-Doc-FRD",trust_remote_code=True)
 ```
 #### From the GitHub repository

 - pixparse/docvqa-single-page-questions
 spaces:
 - JayRay5/DIVE-Doc-docvqa
+tags:
+- docvqa
+- distillation
+- VLM
+- document-understanding
+- OCR-free
 ---
 ## 1 Introduction
 #### From the Transformers library
 ```bash
+from transformers import AutoProcessor, AutoModelForCausalLM
+from PIL import Image
+import torch
+processor = AutoProcessor.from_pretrained("JayRay5/DIVE-Doc-FRD", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("JayRay5/DIVE-Doc-FRD", trust_remote_code=True)
+image = Image.open("your_image_document_path/image_document.png").convert("RGB")
+question_example = "What the the name of the author"
+inputs = (
+            processor(text=question_example, images=image, return_tensors="pt", padding=True)
+            .to(model.device)
+            .to(model.dtype)
+        )
+input_length = inputs["input_ids"].shape[-1]
+with torch.inference_mode():
+        output_ids = model.generate(**inputs, max_new_tokens=100, do_sample=False)
+generated_ids = output_ids[0][input_length:]
+answer = processor.decode(generated_ids, skip_special_tokens=True)
+print(answer)
 ```
 #### From the GitHub repository