Citaman
/

VeCLIP

@@ -1,3 +1,8 @@
 # VeCLIP: Improving CLIP Training via Visual-enriched Captions
 * A novel CLIP training scheme that achieves the SoTA performance on zero-shot ImageNet classification and COCO image text retreival using limited visual-enriched captions.* [[Paper](https://arxiv.org/abs/2310.07699)]
@@ -6,7 +11,7 @@
 <p align="center">
-    <img src="figs/veclip_diagram.jpg" width="100%"></a> <br>
     Diagram of VeCap.
 </p>
@@ -248,4 +253,4 @@ If you find VeCLIP useful, please cite using this BibTeX:
 ## Acknowledgement
 - [axlearn](https://github.com/apple/axlearn): the codebase we use to train the models.
-- [huggingface transformers](https://huggingface.co/docs/transformers/en/index): Transformers provides APIs to load our trained models.

+---
+license: apache-2.0
+language:
+- en
+---
 # VeCLIP: Improving CLIP Training via Visual-enriched Captions
 * A novel CLIP training scheme that achieves the SoTA performance on zero-shot ImageNet classification and COCO image text retreival using limited visual-enriched captions.* [[Paper](https://arxiv.org/abs/2310.07699)]
 <p align="center">
+    <img src="veclip_diagram.jpg" width="100%"></a> <br>
     Diagram of VeCap.
 </p>
 ## Acknowledgement
 - [axlearn](https://github.com/apple/axlearn): the codebase we use to train the models.
+- [huggingface transformers](https://huggingface.co/docs/transformers/en/index): Transformers provides APIs to load our trained models.