File size: 937 Bytes
2409825
 
 
 
 
 
 
 
 
3376747
 
 
 
 
 
2409825
3376747
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
license: apache-2.0
datasets:
- BLIP3o/BLIP3o-Pretrain-Long-Caption
- BLIP3o/BLIP3o-Pretrain-Short-Caption
- BLIP3o/BLIP3o-Pretrain-JourneyDB
base_model:
- OpenGVLab/InternVL3-1B
---
This repository contains the model (**autoencoders**) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing.

UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a **two-stage and self-distillation training** for reconstruction, we empower CLIP to achieve excellent reconstruction results **without compromising its original understanding abilities**. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.

For more details, please refer to the original paper and the GitHub repository:

Paper: https://www.arxiv.org/abs/2507.23278

GitHub: https://github.com/nnnth/UniLIP