| ---
|
| license: apache-2.0
|
| language:
|
| - en
|
| pipeline_tag: zero-shot-image-classification
|
| tags:
|
| - point-cloud
|
| - contrastive-learning
|
| - multi-modal
|
| - clip
|
| datasets:
|
| - Ximeng0831/CTP-Dataset
|
| ---
|
| |
| # CTP: Contrastive Tensor Pre-training |
| [](https://arxiv.org/abs/2603.07874) |
| [](https://huggingface.co/Ximeng0831/CTP) |
| [](https://huggingface.co/datasets/Ximeng0831/CTP-Dataset) |
| [](https://github.com/TAMU-CVRL/CTP) |
|
|
| This repository contains the model checkpoints for **CTP (Contrastive Tensor Pre-training)**. While [CLIP](https://arxiv.org/abs/2103.00020) focuses on aligning two modalities (Image and Text), CTP introduces a unified framework to align **multiple modalities** (Image, Text, and Point Cloud) simultaneously using tensor-based alignment. |
|
|
| ## Repository Structure |
|
|
| The checkpoints are organized by experiment configuration. We use the following naming conventions: |
| - **`all`**: Pre-training of all three encoders (CLIP ViT, CLIP Text, and PointNet++). |
| - **`pc`**: Only the PointNet++ (Point Cloud) backbone is trained; Image and Text encoders remain frozen. |
| - **`nm`**: "No Masked" variant (ablation study). |
|
|
| ### Checkpoint Variations |
| | Folder Name | Method Description | Alignment Strategy | |
| | :--- | :--- | :--- | |
| | `192_l2_tensor_all` | **Default** | L2 Similarity Tensor | |
| | `192_l2_tensor_nm_all` | Default (No Masking) | L2 Similarity Tensor | |
| | `192_l2_tensor_pc` | Frozen Image/Text | L2 Similarity Tensor | |
| | `192_cos_tensor_all` | Cosine Variant | Cosine Similarity Tensor | |
| | `192_cos_matrix_all` | Pairwise Matrix | 3× Pairwise Similarity Matrices | |
| | `192_cos_matrix_pc` | Pairwise (Frozen) | 3× Pairwise Similarity Matrices | |
| | `192_cos_matrix_IP_pc`| Image-Point Only | 1× Similarity Matrix (I-L) | |
|
|
| ## Download the Checkpoints |
|
|
| You can download pretrained checkpoints using the `huggingface_hub` library: |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| # Available: ["192_l2_tensor_all", "192_l2_tensor_nm_all", "192_cos_tensor_all", "192_cos_matrix_all", "192_l2_tensor_pc", "192_cos_matrix_pc", "192_cos_matrix_IP_pc"] |
| |
| config_name = "192_l2_tensor_all" |
| |
| checkpoint_path = hf_hub_download( |
| repo_id="Ximeng0831/CTP", |
| subfolder=config_name, |
| filename="ckpt_epoch9.pt", |
| # local_dir="checkpoints" |
| ) |
| ``` |
| Source code: https://github.com/TAMU-CVRL/CTP |
|
|
| ## Training Configurations |
|
|
| Detailed configuration files (YAML) for each experiment are available in the [Official GitHub Repository](https://github.com/TAMU-CVRL/CTP/tree/main/configs). |
|
|
| * **`all`:** Training is performed for **10 epochs** with a total batch size of **384**. These models are trained using **two NVIDIA A100 (40G)** GPUs. |
| * **`pc`:** Training is conducted for **20 epochs** with a batch size of **192**. These models are trained on a **single NVIDIA RTX 4090** GPU. |
|
|
| > **Note:** For specific hyperparameter settings such as learning rate schedules and weight decay, please refer to the corresponding `.yaml` files in the link above. |