File size: 975 Bytes
f002b44 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ---
license: cc-by-nc-sa-4.0
pipeline_tag: image-text-to-text
library_name: transformers
---
# DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
This model is presented in the paper [DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding](https://huggingface.co/papers/2408.15045). DocLayLLM is designed for text-rich document understanding, integrating visual patch tokens and 2D positional tokens into LLMs to enhance their document comprehension and OCR information perception.
## How to Use
A more complete usage example will be added when available. For now, a basic example:
```python
from transformers import pipeline
pipe = pipeline("text-generation", model="your_model_id") # replace your_model_id
result = pipe("Your input text here.")
print(result)
```
Replace `"your_model_id"` with the actual Hugging Face model ID. |