---
license: apache-2.0
language:
- en
metrics:
- accuracy
pipeline_tag: image-text-to-text
library_name: transformers
---

<h3>PyPE: Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding</h3>

For more details, please refer to Github: [PyPE](https://github.com/SakuraTroyChen/PyPE).