cooper_robot commited on
Commit
2daff4c
·
1 Parent(s): baada4a

Add release note for v1.1.0

Browse files
Files changed (2) hide show
  1. README.md +33 -0
  2. resource/OWLViT.png +3 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: pytorch
3
+ ---
4
+
5
+ ![owlvit_logo](resource/OWLViT.png)
6
+
7
+ OWL-ViT extends CLIP-based vision–language models to perform open-vocabulary object detection by aligning image regions with textual descriptions, enabling zero-shot detection without task-specific training.
8
+
9
+ Original paper: [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230)
10
+
11
+ # OWLv1 CLIP ViT-B/32
12
+
13
+ This model uses the OWL-ViT v1 with a CLIP ViT-B/32 Transformer architecture as an image encoder and a masked self-attention Transformer as a text encoder, leveraging the vision–language alignment of CLIP to detect objects specified by arbitrary text queries. It is well suited for applications such as open-vocabulary detection, image search, and real-time visual understanding across diverse domains.
14
+
15
+ Model Configuration:
16
+ - Reference implementation: [OWLv1 CLIP ViT-B/32](https://github.com/google-research/scenic/tree/main/scenic/projects/owl_vit)
17
+ - Original Weight: [owlvit-base-patch32](https://huggingface.co/google/owlvit-base-patch32/blob/main/pytorch_model.bin)
18
+ - Resolution: 3x768x768
19
+ - Support Cooper version:
20
+ - Cooper SDK: [2.5.2]
21
+ - Cooper Foundry: [2.2]
22
+
23
+ | Model | Device | Model Link |
24
+ | :-----: | :-----: | :-----: |
25
+ | OWLv1 CLIP ViT-B/32 Image Encoder| N1-655 | [Model_Link](https://huggingface.co/Ambarella/OWLViT/blob/main/n1-655_owlvit_v1_base_patch32_image_encoder.bin) |
26
+ | OWLv1 CLIP ViT-B/32 Text Encoder| N1-655 | [Model_Link](https://huggingface.co/Ambarella/OWLViT/blob/main/n1-655_owlvit_v1_base_patch32_text_encoder.bin) |
27
+ | OWLv1 CLIP ViT-B/32 Predictor| N1-655 | [Model_Link](https://huggingface.co/Ambarella/OWLViT/blob/main/n1-655_owlvit_v1_base_patch32_predictor.bin) |
28
+ | OWLv1 CLIP ViT-B/32 Image Encoder| CV72 | [Model_Link](https://huggingface.co/Ambarella/OWLViT/blob/main/cv72_owlvit_v1_base_patch32_image_encoder.bin) |
29
+ | OWLv1 CLIP ViT-B/32 Text Encoder| CV72 | [Model_Link](https://huggingface.co/Ambarella/OWLViT/blob/main/cv72_owlvit_v1_base_patch32_text_encoder.bin) |
30
+ | OWLv1 CLIP ViT-B/32 Predictor| CV72 | [Model_Link](https://huggingface.co/Ambarella/OWLViT/blob/main/cv72_owlvit_v1_base_patch32_predictor.bin) |
31
+ | OWLv1 CLIP ViT-B/32 Image Encoder| CV75 | [Model_Link](https://huggingface.co/Ambarella/OWLViT/blob/main/cv75_owlvit_v1_base_patch32_image_encoder.bin) |
32
+ | OWLv1 CLIP ViT-B/32 Text Encoder| CV75 | [Model_Link](https://huggingface.co/Ambarella/OWLViT/blob/main/cv75_owlvit_v1_base_patch32_text_encoder.bin) |
33
+ | OWLv1 CLIP ViT-B/32 Predictor| CV75 | [Model_Link](https://huggingface.co/Ambarella/OWLViT/blob/main/cv75_owlvit_v1_base_patch32_predictor.bin) |
resource/OWLViT.png ADDED

Git LFS Details

  • SHA256: e8fd4cbaef3bceaee6d3ba21c5e1ef263bcc513de55046e5bc03af515e4b5c81
  • Pointer size: 132 Bytes
  • Size of remote file: 1.62 MB