yujia0913 commited on
Commit
d69b92a
·
verified ·
1 Parent(s): d6ef7e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -13,14 +13,15 @@ tags:
13
  This repository contains the model weights for **Concerto**, a novel approach for learning robust spatial representations presented in the paper [Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations](https://huggingface.co/papers/2510.23607).
14
 
15
  - **Paper:** [Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations](https://huggingface.co/papers/2510.23607)
16
- - **Project Page:** [https://pointcept.github.io/Concerto/](https://pointcept.github.io/Concerto/)
17
  - **Codebase:** [https://github.com/Pointcept/Pointcept](https://github.com/Pointcept/Pointcept)
 
18
 
19
  ## Abstract
20
  Humans learn abstract concepts through multisensory synergy, and once formed, such representations can often be recalled from a single modality. Inspired by this principle, we introduce Concerto, a minimalist simulation of human concept learning for spatial cognition, combining 3D intra-modal self-distillation with 2D-3D cross-modal joint embedding. Despite its simplicity, Concerto learns more coherent and informative spatial features, as demonstrated by zero-shot visualizations. It outperforms both standalone SOTA 2D and 3D self-supervised models by 14.2% and 4.8%, respectively, as well as their feature concatenation, in linear probing for 3D scene perception. With full fine-tuning, Concerto sets new SOTA results across multiple scene understanding benchmarks (e.g., 80.7% mIoU on ScanNet). We further present a variant of Concerto tailored for video-lifted point cloud spatial understanding, and a translator that linearly projects Concerto representations into CLIP's language space, enabling open-world perception. These results highlight that Concerto emerges spatial representations with superior fine-grained geometric and semantic consistency.
21
 
22
  ## Usage
23
- For detailed installation, data preparation, training, and testing instructions, please refer to the [official GitHub repository](https://github.com/Pointcept/Pointcept).
24
 
25
  ## Citation
26
  If you find Concerto or the Pointcept codebase useful in your research, please cite the following papers:
 
13
  This repository contains the model weights for **Concerto**, a novel approach for learning robust spatial representations presented in the paper [Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations](https://huggingface.co/papers/2510.23607).
14
 
15
  - **Paper:** [Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations](https://huggingface.co/papers/2510.23607)
16
+ - **Project Page:** [https://pointcept.github.io/Concerto/](https://pointcept.github.io/Concerto)
17
  - **Codebase:** [https://github.com/Pointcept/Pointcept](https://github.com/Pointcept/Pointcept)
18
+ - **Inference:** [https://github.com/Pointcept/Concerto](https://github.com/Pointcept/Concerto)
19
 
20
  ## Abstract
21
  Humans learn abstract concepts through multisensory synergy, and once formed, such representations can often be recalled from a single modality. Inspired by this principle, we introduce Concerto, a minimalist simulation of human concept learning for spatial cognition, combining 3D intra-modal self-distillation with 2D-3D cross-modal joint embedding. Despite its simplicity, Concerto learns more coherent and informative spatial features, as demonstrated by zero-shot visualizations. It outperforms both standalone SOTA 2D and 3D self-supervised models by 14.2% and 4.8%, respectively, as well as their feature concatenation, in linear probing for 3D scene perception. With full fine-tuning, Concerto sets new SOTA results across multiple scene understanding benchmarks (e.g., 80.7% mIoU on ScanNet). We further present a variant of Concerto tailored for video-lifted point cloud spatial understanding, and a translator that linearly projects Concerto representations into CLIP's language space, enabling open-world perception. These results highlight that Concerto emerges spatial representations with superior fine-grained geometric and semantic consistency.
22
 
23
  ## Usage
24
+ For detailed installation, data preparation, training, and testing instructions, please refer to the [official codebase](https://github.com/Pointcept/Pointcept) and [inference demo](https://github.com/Pointcept/Concerto).
25
 
26
  ## Citation
27
  If you find Concerto or the Pointcept codebase useful in your research, please cite the following papers: