kanashi6
/

UniLIP

Add pipeline tag and update model card

by nielsr HF Staff - opened Feb 10

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,18 +1,30 @@
 ---
-license: apache-2.0
 datasets:
 - BLIP3o/BLIP3o-Pretrain-Long-Caption
 - BLIP3o/BLIP3o-Pretrain-Short-Caption
 - BLIP3o/BLIP3o-Pretrain-JourneyDB
-base_model:
-- OpenGVLab/InternVL3-1B
 ---
-This repository contains the model (**autoencoders**) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing.
 UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a **two-stage and self-distillation training** for reconstruction, we empower CLIP to achieve excellent reconstruction results **without compromising its original understanding abilities**. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.
 For more details, please refer to the original paper and the GitHub repository:
-Paper: https://www.arxiv.org/abs/2507.23278
-GitHub: https://github.com/nnnth/UniLIP

 ---
+base_model:
+- OpenGVLab/InternVL3-1B
 datasets:
 - BLIP3o/BLIP3o-Pretrain-Long-Caption
 - BLIP3o/BLIP3o-Pretrain-Short-Caption
 - BLIP3o/BLIP3o-Pretrain-JourneyDB
+license: apache-2.0
+pipeline_tag: any-to-any
 ---
+This repository contains the model (**autoencoders**) presented in the paper [UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing](https://huggingface.co/papers/2507.23278).
 UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a **two-stage and self-distillation training** for reconstruction, we empower CLIP to achieve excellent reconstruction results **without compromising its original understanding abilities**. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.
 For more details, please refer to the original paper and the GitHub repository:
+- **Paper**: [https://arxiv.org/abs/2507.23278](https://arxiv.org/abs/2507.23278)
+- **GitHub**: [https://github.com/nnnth/UniLIP](https://github.com/nnnth/UniLIP)
+## Citation
+```bibtex
+@article{tang2025unilip,
+  title={UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing},
+  author={Tang, Hao and Xie, Chenwei and Bao, Xiaoyi and Weng, Tingyu and Li, Pandeng and Zheng, Yun and Wang, Liwei},
+  journal={arXiv preprint arXiv:2507.23278},
+  year={2025}
+}
+```