| license: apache-2.0 | |
| language: | |
| - en | |
| metrics: | |
| - accuracy | |
| pipeline_tag: image-text-to-text | |
| library_name: transformers | |
| <h3>PyPE: Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding</h3> | |
| For more details, please refer to Github: [PyPE](https://github.com/SakuraTroyChen/PyPE). |