Gabriele commited on
Commit ·
d0e7527
1
Parent(s): 6dd2600
Add navigation table linking all variants and DPT heads
Browse files
README.md
CHANGED
|
@@ -14,12 +14,12 @@ pipeline_tag: zero-shot-image-classification
|
|
| 14 |
|
| 15 |
TIPSv2 (Text-Image Pre-training with Spatial awareness) is a family of contrastive vision-language models that produce spatially rich image features aligned with text embeddings. This is the Base variant with 86M vision params and 110M text params.
|
| 16 |
|
| 17 |
-
| Variant | Vision params | Text params | Embed dim |
|
| 18 |
-
|---------|--------------|-------------|-----------|
|
| 19 |
-
| B/14 | 86M | 110M | 768 |
|
| 20 |
-
| L/14 | 303M | 184M | 1024 |
|
| 21 |
-
| SO400m/14 | 412M | 448M | 1152 |
|
| 22 |
-
| g/14 | 1.1B | 389M | 1536 |
|
| 23 |
|
| 24 |
## Usage
|
| 25 |
|
|
|
|
| 14 |
|
| 15 |
TIPSv2 (Text-Image Pre-training with Spatial awareness) is a family of contrastive vision-language models that produce spatially rich image features aligned with text embeddings. This is the Base variant with 86M vision params and 110M text params.
|
| 16 |
|
| 17 |
+
| Variant | Vision params | Text params | Embed dim | DPT Heads |
|
| 18 |
+
|---------|--------------|-------------|-----------|-----------|
|
| 19 |
+
| [B/14](https://huggingface.co/google/tipsv2-b14) | 86M | 110M | 768 | [B/14-dpt](https://huggingface.co/google/tipsv2-b14-dpt) |
|
| 20 |
+
| [L/14](https://huggingface.co/google/tipsv2-l14) | 303M | 184M | 1024 | [L/14-dpt](https://huggingface.co/google/tipsv2-l14-dpt) |
|
| 21 |
+
| [SO400m/14](https://huggingface.co/google/tipsv2-so400m14) | 412M | 448M | 1152 | [SO400m/14-dpt](https://huggingface.co/google/tipsv2-so400m14-dpt) |
|
| 22 |
+
| [g/14](https://huggingface.co/google/tipsv2-g14) | 1.1B | 389M | 1536 | [g/14-dpt](https://huggingface.co/google/tipsv2-g14-dpt) |
|
| 23 |
|
| 24 |
## Usage
|
| 25 |
|