Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,38 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
-
tags:
|
| 4 |
-
- pytorch_model_hub_mixin
|
| 5 |
-
- model_hub_mixin
|
| 6 |
---
|
|
|
|
| 7 |
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
+
# Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
|
| 5 |
|
| 6 |
+
Official model weights for the `Locate-3D` models and the `3D-JEPA` encoders
|
| 7 |
+
|
| 8 |
+
## Locate 3D
|
| 9 |
+
|
| 10 |
+
Locate 3D is a model for localizing objects in 3D scenes from referring expressions like “the
|
| 11 |
+
small coffee table between the sofa and the lamp.” Locate 3D sets a new state-of-the-art on standard
|
| 12 |
+
referential grounding benchmarks and showcases robust generalization capabilities. Notably, Locate
|
| 13 |
+
3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world
|
| 14 |
+
deployment on robots and AR devices.
|
| 15 |
+
|
| 16 |
+
## 3D-JEPA
|
| 17 |
+
|
| 18 |
+
3D-JEPA, a novel self-supervised
|
| 19 |
+
learning (SSL) algorithm applicable to sensor point clouds, is key to `Locate 3D`. It takes as input a 3D pointcloud
|
| 20 |
+
featurized using 2D foundation models (CLIP, DINO). Subsequently, masked prediction in latent space
|
| 21 |
+
is employed as a pretext task to aid the self-supervised learning of contextualized pointcloud features.
|
| 22 |
+
Once trained, the 3D-JEPA encoder is finetuned alongside a language-conditioned decoder to jointly
|
| 23 |
+
predict 3D masks and bounding boxes.
|
| 24 |
+
|
| 25 |
+
## Models
|
| 26 |
+
|
| 27 |
+
- **Locate-3D**: Locate-3D model trained on public referential grounding datasets
|
| 28 |
+
- **Locate-3D+**: Locate-3D model trained on public referential grounding datasets and the newly released Locate 3D Dataset
|
| 29 |
+
- **3D-JEPA**: Pre-trained SSL encoder for 3D understanding
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
## How to Use
|
| 33 |
+
|
| 34 |
+
For detailed instructions on how to load the encoder and integrate it into your downstream task, please refer to our [GitHub repository](https://github.com/facebookresearch/locate-3d).
|
| 35 |
+
|
| 36 |
+
## License
|
| 37 |
+
|
| 38 |
+
The majority of `locate-3` is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Pointcept is licensed under the MIT license.
|