cooper_robot
commited on
Commit
·
d659e57
1
Parent(s):
979f6ca
Add release note for v1.1.0
Browse files- README.md +30 -0
- resource/LongCLIP.png +3 -0
README.md
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: pytorch
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+

|
| 6 |
+
|
| 7 |
+
LongCLIP extends the CLIP vision–language framework to support significantly longer text inputs, enabling richer contextual understanding while preserving strong image–text alignment.
|
| 8 |
+
|
| 9 |
+
Original paper: [Long-CLIP: Unlocking Long-Text Capability in CLIP, Zhang et al., 2024](https://arxiv.org/abs/2403.15378)
|
| 10 |
+
|
| 11 |
+
# LongCLIP-B16
|
| 12 |
+
|
| 13 |
+
This model uses the LongCLIP B/16 variant, which is based on a ViT-Base backbone with 16×16 image patches and enhanced long-text encoding capacity. It is well suited for vision–language applications such as image retrieval, zero-shot classification, and multimodal reasoning where long textual prompts or descriptions are important.
|
| 14 |
+
|
| 15 |
+
Model Configuration:
|
| 16 |
+
- Reference implementation: [LongCLIP-B16](https://github.com/beichenzbc/Long-CLIP)
|
| 17 |
+
- Original Weight: [LongCLIP-B16](https://huggingface.co/BeichenZhang/LongCLIP-B/blob/main/longclip-B.pt)
|
| 18 |
+
- Resolution: 3x224x224
|
| 19 |
+
- Support Cooper version:
|
| 20 |
+
- Cooper SDK: [2.5.2]
|
| 21 |
+
- Cooper Foundry: [2.2]
|
| 22 |
+
|
| 23 |
+
| Model | Device | Model Link |
|
| 24 |
+
| :-----: | :-----: | :-----: |
|
| 25 |
+
| LongCLIP-B16 Image Encoder | N1-655 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/n1-655_longclip_base_patch16_image_encoder.bin) |
|
| 26 |
+
| LongCLIP-B16 Text Encoder | N1-655 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/n1-655_longclip_base_patch16_text_encoder.bin) |
|
| 27 |
+
| LongCLIP-B16 Image encoder | CV72 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/cv72_longclip_base_patch16_image_encoder.bin) |
|
| 28 |
+
| LongCLIP-B16 Text Encoder | CV72 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/cv72_longclip_base_patch16_text_encoder.bin) |
|
| 29 |
+
| LongCLIP-B16 Image encoder | CV75 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/cv75_longclip_base_patch16_image_encoder.bin) |
|
| 30 |
+
| LongCLIP-B16 Text Encoder | CV75 | [Model_Link](https://huggingface.co/Ambarella/LongCLIP/blob/main/cv75_longclip_base_patch16_text_encoder.bin) |
|
resource/LongCLIP.png
ADDED
|
Git LFS Details
|