| license: mit | |
| This repository serves as the official model zoo for **Let ViT Speak: Generative Language-Image Pre-training**. | |
| ## Currently released models | |
| 1. Mdels from fixed low resolution pretraining: | |
| - GenLIP-L16-224 | |
| - GenLIP-So16-224 | |
| - GenLIP-g16-224 | |
| 2. NaViT models: | |
| - GenLIP-L16-NaViT | |
| - GenLIP-So16-NaViT | |
| - GenLIP-g16-NaViT | |
| We use siglip image preprocessor for our fixed low resolution models (\*-224), and use a Qwen2-VL style image preprocessor for our NaViT models (*-NaViT). | |
| Pretraining and implementation details can be found in our codebase [[GenLIP](https://github.com/YanFangCS/GenLIP)]. | |