--- license: apache-2.0 language: - en pipeline_tag: zero-shot-image-classification tags: - point-cloud - contrastive-learning - multi-modal - clip datasets: - Ximeng0831/CTP-Dataset --- # CTP: Contrastive Tensor Pre-training [![arXiv](https://img.shields.io/badge/arXiv-2603.07874-b31b1b.svg)](https://arxiv.org/abs/2603.07874) [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-FFD21E)](https://huggingface.co/Ximeng0831/CTP) [![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/Ximeng0831/CTP-Dataset) [![GitHub](https://img.shields.io/badge/GitHub-CTP-lightgrey?logo=github)](https://github.com/TAMU-CVRL/CTP) This repository contains the model checkpoints for **CTP (Contrastive Tensor Pre-training)**. While [CLIP](https://arxiv.org/abs/2103.00020) focuses on aligning two modalities (Image and Text), CTP introduces a unified framework to align **multiple modalities** (Image, Text, and Point Cloud) simultaneously using tensor-based alignment. ## Repository Structure The checkpoints are organized by experiment configuration. We use the following naming conventions: - **`all`**: Pre-training of all three encoders (CLIP ViT, CLIP Text, and PointNet++). - **`pc`**: Only the PointNet++ (Point Cloud) backbone is trained; Image and Text encoders remain frozen. - **`nm`**: "No Masked" variant (ablation study). ### Checkpoint Variations | Folder Name | Method Description | Alignment Strategy | | :--- | :--- | :--- | | `192_l2_tensor_all` | **Default** | L2 Similarity Tensor | | `192_l2_tensor_nm_all` | Default (No Masking) | L2 Similarity Tensor | | `192_l2_tensor_pc` | Frozen Image/Text | L2 Similarity Tensor | | `192_cos_tensor_all` | Cosine Variant | Cosine Similarity Tensor | | `192_cos_matrix_all` | Pairwise Matrix | 3× Pairwise Similarity Matrices | | `192_cos_matrix_pc` | Pairwise (Frozen) | 3× Pairwise Similarity Matrices | | `192_cos_matrix_IP_pc`| Image-Point Only | 1× Similarity Matrix (I-L) | ## Download the Checkpoints You can download pretrained checkpoints using the `huggingface_hub` library: ```python from huggingface_hub import hf_hub_download # Available: ["192_l2_tensor_all", "192_l2_tensor_nm_all", "192_cos_tensor_all", "192_cos_matrix_all", "192_l2_tensor_pc", "192_cos_matrix_pc", "192_cos_matrix_IP_pc"] config_name = "192_l2_tensor_all" checkpoint_path = hf_hub_download( repo_id="Ximeng0831/CTP", subfolder=config_name, filename="ckpt_epoch9.pt", # local_dir="checkpoints" ) ``` Source code: https://github.com/TAMU-CVRL/CTP ## Training Configurations Detailed configuration files (YAML) for each experiment are available in the [Official GitHub Repository](https://github.com/TAMU-CVRL/CTP/tree/main/configs). * **`all`:** Training is performed for **10 epochs** with a total batch size of **384**. These models are trained using **two NVIDIA A100 (40G)** GPUs. * **`pc`:** Training is conducted for **20 epochs** with a batch size of **192**. These models are trained on a **single NVIDIA RTX 4090** GPU. > **Note:** For specific hyperparameter settings such as learning rate schedules and weight decay, please refer to the corresponding `.yaml` files in the link above.