File size: 3,283 Bytes
e92d472 186b91e 4078dce 8ccf29c 186b91e 9d1d906 186b91e ec218ea 186b91e e92d472 186b91e e92d472 186b91e e92d472 186b91e e92d472 186b91e 9d1d906 2a85ebd 9d1d906 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | ---
license: apache-2.0
language:
- en
pipeline_tag: zero-shot-image-classification
tags:
- point-cloud
- contrastive-learning
- multi-modal
- clip
datasets:
- Ximeng0831/CTP-Dataset
---
# CTP: Contrastive Tensor Pre-training
[](https://arxiv.org/abs/2603.07874)
[](https://huggingface.co/Ximeng0831/CTP)
[](https://huggingface.co/datasets/Ximeng0831/CTP-Dataset)
[](https://github.com/TAMU-CVRL/CTP)
This repository contains the model checkpoints for **CTP (Contrastive Tensor Pre-training)**. While [CLIP](https://arxiv.org/abs/2103.00020) focuses on aligning two modalities (Image and Text), CTP introduces a unified framework to align **multiple modalities** (Image, Text, and Point Cloud) simultaneously using tensor-based alignment.
## Repository Structure
The checkpoints are organized by experiment configuration. We use the following naming conventions:
- **`all`**: Pre-training of all three encoders (CLIP ViT, CLIP Text, and PointNet++).
- **`pc`**: Only the PointNet++ (Point Cloud) backbone is trained; Image and Text encoders remain frozen.
- **`nm`**: "No Masked" variant (ablation study).
### Checkpoint Variations
| Folder Name | Method Description | Alignment Strategy |
| :--- | :--- | :--- |
| `192_l2_tensor_all` | **Default** | L2 Similarity Tensor |
| `192_l2_tensor_nm_all` | Default (No Masking) | L2 Similarity Tensor |
| `192_l2_tensor_pc` | Frozen Image/Text | L2 Similarity Tensor |
| `192_cos_tensor_all` | Cosine Variant | Cosine Similarity Tensor |
| `192_cos_matrix_all` | Pairwise Matrix | 3× Pairwise Similarity Matrices |
| `192_cos_matrix_pc` | Pairwise (Frozen) | 3× Pairwise Similarity Matrices |
| `192_cos_matrix_IP_pc`| Image-Point Only | 1× Similarity Matrix (I-L) |
## Download the Checkpoints
You can download pretrained checkpoints using the `huggingface_hub` library:
```python
from huggingface_hub import hf_hub_download
# Available: ["192_l2_tensor_all", "192_l2_tensor_nm_all", "192_cos_tensor_all", "192_cos_matrix_all", "192_l2_tensor_pc", "192_cos_matrix_pc", "192_cos_matrix_IP_pc"]
config_name = "192_l2_tensor_all"
checkpoint_path = hf_hub_download(
repo_id="Ximeng0831/CTP",
subfolder=config_name,
filename="ckpt_epoch9.pt",
# local_dir="checkpoints"
)
```
Source code: https://github.com/TAMU-CVRL/CTP
## Training Configurations
Detailed configuration files (YAML) for each experiment are available in the [Official GitHub Repository](https://github.com/TAMU-CVRL/CTP/tree/main/configs).
* **`all`:** Training is performed for **10 epochs** with a total batch size of **384**. These models are trained using **two NVIDIA A100 (40G)** GPUs.
* **`pc`:** Training is conducted for **20 epochs** with a batch size of **192**. These models are trained on a **single NVIDIA RTX 4090** GPU.
> **Note:** For specific hyperparameter settings such as learning rate schedules and weight decay, please refer to the corresponding `.yaml` files in the link above. |