File size: 341 Bytes
2f2f6ed
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
---
license: apache-2.0
language:
- en
metrics:
- accuracy
pipeline_tag: image-text-to-text
library_name: transformers
---

<h3>PyPE: Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding</h3>

For more details, please refer to Github: [PyPE](https://github.com/SakuraTroyChen/PyPE).