--- license: apache-2.0 language: - en metrics: - accuracy pipeline_tag: image-text-to-text library_name: transformers ---

PyPE: Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding

For more details, please refer to Github: [PyPE](https://github.com/SakuraTroyChen/PyPE).