File size: 937 Bytes

---
license: apache-2.0
datasets:
- BLIP3o/BLIP3o-Pretrain-Long-Caption
- BLIP3o/BLIP3o-Pretrain-Short-Caption
- BLIP3o/BLIP3o-Pretrain-JourneyDB
base_model:
- OpenGVLab/InternVL3-1B
---
This repository contains the model (**autoencoders**) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing.

UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a **two-stage and self-distillation training** for reconstruction, we empower CLIP to achieve excellent reconstruction results **without compromising its original understanding abilities**. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.

For more details, please refer to the original paper and the GitHub repository:

Paper: https://www.arxiv.org/abs/2507.23278

GitHub: https://github.com/nnnth/UniLIP