| license: cc-by-nc-sa-4.0 | |
| pipeline_tag: image-text-to-text | |
| library_name: transformers | |
| # DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding | |
| This model is presented in the paper [DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding](https://huggingface.co/papers/2408.15045). DocLayLLM is designed for text-rich document understanding, integrating visual patch tokens and 2D positional tokens into LLMs to enhance their document comprehension and OCR information perception. | |
| ## How to Use | |
| A more complete usage example will be added when available. For now, a basic example: | |
| ```python | |
| from transformers import pipeline | |
| pipe = pipeline("text-generation", model="your_model_id") # replace your_model_id | |
| result = pipe("Your input text here.") | |
| print(result) | |
| ``` | |
| Replace `"your_model_id"` with the actual Hugging Face model ID. |