Add release note for v1.1.0

Files changed (2) hide show

README.md +30 -0
resource/LongCLIP.png +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,30 @@

+---
+library_name: pytorch
+---
+![longclip_logo](resource/LongCLIP.png)
+LongCLIP extends the CLIP vision–language framework to support significantly longer text inputs, enabling richer contextual understanding while preserving strong image–text alignment.
+Original paper: [Long-CLIP: Unlocking Long-Text Capability in CLIP, Zhang et al., 2024](https://arxiv.org/abs/2403.15378)
+# LongCLIP-B16
+This model uses the LongCLIP B/16 variant, which is based on a ViT-Base backbone with 16×16 image patches and enhanced long-text encoding capacity. It is well suited for vision–language applications such as image retrieval, zero-shot classification, and multimodal reasoning where long textual prompts or descriptions are important.
+Model Configuration:
+- Reference implementation: [LongCLIP-B16](https://github.com/beichenzbc/Long-CLIP)
+- Original Weight: [LongCLIP-B16](https://huggingface.co/BeichenZhang/LongCLIP-B/blob/main/longclip-B.pt)
+- Resolution: 3x224x224
+- Support Cooper version:
+    - Cooper SDK: [2.5.2]
+    - Cooper Foundry: [2.2]
+| Model | Device | Model Link |
+| :-----: | :-----: | :-----: |
+| LongCLIP-B16 Image Encoder | N1-655 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/n1-655_longclip_base_patch16_image_encoder.bin) |
+| LongCLIP-B16 Text Encoder | N1-655 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/n1-655_longclip_base_patch16_text_encoder.bin) |
+| LongCLIP-B16 Image encoder | CV72 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/cv72_longclip_base_patch16_image_encoder.bin) |
+| LongCLIP-B16 Text Encoder | CV72 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/cv72_longclip_base_patch16_text_encoder.bin) |
+| LongCLIP-B16 Image encoder | CV75 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/cv75_longclip_base_patch16_image_encoder.bin) |
+| LongCLIP-B16 Text Encoder | CV75 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/cv75_longclip_base_patch16_text_encoder.bin) |

resource/LongCLIP.png ADDED Viewed

Git LFS Details

SHA256: 361a2087d07b38eafdf908168b471305864673f6c50f819324bba56abcb0d812
Pointer size: 132 Bytes
Size of remote file: 1.21 MB