DocLayLLM_zero_shot / README.md
nielsr's picture
nielsr HF Staff
Add pipeline tag, library name, paper link, and basic description
f002b44 verified
|
raw
history blame
975 Bytes
metadata
license: cc-by-nc-sa-4.0
pipeline_tag: image-text-to-text
library_name: transformers

DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding

This model is presented in the paper DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding. DocLayLLM is designed for text-rich document understanding, integrating visual patch tokens and 2D positional tokens into LLMs to enhance their document comprehension and OCR information perception.

How to Use

A more complete usage example will be added when available. For now, a basic example:

from transformers import pipeline

pipe = pipeline("text-generation", model="your_model_id")  # replace your_model_id
result = pipe("Your input text here.")
print(result)

Replace "your_model_id" with the actual Hugging Face model ID.