pmcvay commited on
Commit
3111b8c
·
verified ·
1 Parent(s): 1e3f030

Push model using huggingface_hub.

Browse files
Files changed (2) hide show
  1. README.md +6 -34
  2. config.json +16 -1
README.md CHANGED
@@ -1,38 +1,10 @@
1
  ---
2
  license: cc-by-nc-4.0
 
 
 
3
  ---
4
- # Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
5
 
6
- Official model weights for the `Locate-3D` models and the `3D-JEPA` encoders
7
-
8
- ## Locate 3D
9
-
10
- Locate 3D is a model for localizing objects in 3D scenes from referring expressions like “the
11
- small coffee table between the sofa and the lamp.” Locate 3D sets a new state-of-the-art on standard
12
- referential grounding benchmarks and showcases robust generalization capabilities. Notably, Locate
13
- 3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world
14
- deployment on robots and AR devices.
15
-
16
- ## 3D-JEPA
17
-
18
- 3D-JEPA, a novel self-supervised
19
- learning (SSL) algorithm applicable to sensor point clouds, is key to `Locate 3D`. It takes as input a 3D pointcloud
20
- featurized using 2D foundation models (CLIP, DINO). Subsequently, masked prediction in latent space
21
- is employed as a pretext task to aid the self-supervised learning of contextualized pointcloud features.
22
- Once trained, the 3D-JEPA encoder is finetuned alongside a language-conditioned decoder to jointly
23
- predict 3D masks and bounding boxes.
24
-
25
- ## Models
26
-
27
- - **Locate-3D**: Locate-3D model trained on public referential grounding datasets
28
- - **Locate-3D+**: Locate-3D model trained on public referential grounding datasets and the newly released Locate 3D Dataset
29
- - **3D-JEPA**: Pre-trained SSL encoder for 3D understanding
30
-
31
-
32
- ## How to Use
33
-
34
- For detailed instructions on how to load the encoder and integrate it into your downstream task, please refer to our [GitHub repository](https://github.com/facebookresearch/locate-3d).
35
-
36
- ## License
37
-
38
- The majority of `locate-3` is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Pointcept is licensed under the MIT license.
 
1
  ---
2
  license: cc-by-nc-4.0
3
+ tags:
4
+ - pytorch_model_hub_mixin
5
+ - model_hub_mixin
6
  ---
 
7
 
8
+ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
9
+ - Library: [More Information Needed]
10
+ - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -2,7 +2,22 @@
2
  "embed_dim": 256,
3
  "input_feat_dim": 1536,
4
  "num_rgb_harmonic_functions": 16,
5
- "ptv3_args": {},
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  "rgb_proj_dim": 256,
7
  "voxel_size": 0.05
8
  }
 
2
  "embed_dim": 256,
3
  "input_feat_dim": 1536,
4
  "num_rgb_harmonic_functions": 16,
5
+ "ptv3_args": {
6
+ "dec_channels": [
7
+ 256,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "enc_channels": [
13
+ 256,
14
+ 256,
15
+ 512,
16
+ 512,
17
+ 1024
18
+ ],
19
+ "in_channels": 256
20
+ },
21
  "rgb_proj_dim": 256,
22
  "voxel_size": 0.05
23
  }