Add pipeline tag and update model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +18 -6
README.md CHANGED
@@ -1,18 +1,30 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - BLIP3o/BLIP3o-Pretrain-Long-Caption
5
  - BLIP3o/BLIP3o-Pretrain-Short-Caption
6
  - BLIP3o/BLIP3o-Pretrain-JourneyDB
7
- base_model:
8
- - OpenGVLab/InternVL3-1B
9
  ---
10
- This repository contains the model (**autoencoders**) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing.
 
11
 
12
  UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a **two-stage and self-distillation training** for reconstruction, we empower CLIP to achieve excellent reconstruction results **without compromising its original understanding abilities**. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.
13
 
14
  For more details, please refer to the original paper and the GitHub repository:
15
 
16
- Paper: https://www.arxiv.org/abs/2507.23278
 
 
 
17
 
18
- GitHub: https://github.com/nnnth/UniLIP
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - OpenGVLab/InternVL3-1B
4
  datasets:
5
  - BLIP3o/BLIP3o-Pretrain-Long-Caption
6
  - BLIP3o/BLIP3o-Pretrain-Short-Caption
7
  - BLIP3o/BLIP3o-Pretrain-JourneyDB
8
+ license: apache-2.0
9
+ pipeline_tag: any-to-any
10
  ---
11
+
12
+ This repository contains the model (**autoencoders**) presented in the paper [UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing](https://huggingface.co/papers/2507.23278).
13
 
14
  UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a **two-stage and self-distillation training** for reconstruction, we empower CLIP to achieve excellent reconstruction results **without compromising its original understanding abilities**. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.
15
 
16
  For more details, please refer to the original paper and the GitHub repository:
17
 
18
+ - **Paper**: [https://arxiv.org/abs/2507.23278](https://arxiv.org/abs/2507.23278)
19
+ - **GitHub**: [https://github.com/nnnth/UniLIP](https://github.com/nnnth/UniLIP)
20
+
21
+ ## Citation
22
 
23
+ ```bibtex
24
+ @article{tang2025unilip,
25
+ title={UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing},
26
+ author={Tang, Hao and Xie, Chenwei and Bao, Xiaoyi and Weng, Tingyu and Li, Pandeng and Zheng, Yun and Wang, Liwei},
27
+ journal={arXiv preprint arXiv:2507.23278},
28
+ year={2025}
29
+ }
30
+ ```