facebook
/

3d-jepa

Safetensors

Model card Files Files and versions

xet

Community

pmcvay commited on Apr 17, 2025

Commit

3111b8c

verified ·

1 Parent(s): 1e3f030

Push model using huggingface_hub.

Browse files

Files changed (2) hide show

README.md +6 -34
config.json +16 -1

README.md CHANGED Viewed

@@ -1,38 +1,10 @@
 ---
 license: cc-by-nc-4.0
 ---
-# Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
-Official model weights for the `Locate-3D` models and the `3D-JEPA` encoders
-## Locate 3D
-Locate 3D is a model for localizing objects in 3D scenes from referring expressions like “the
-small coffee table between the sofa and the lamp.” Locate 3D sets a new state-of-the-art on standard
-referential grounding benchmarks and showcases robust generalization capabilities. Notably, Locate
-3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world
-deployment on robots and AR devices.
-## 3D-JEPA
-3D-JEPA, a novel self-supervised
-learning (SSL) algorithm applicable to sensor point clouds, is key to `Locate 3D`. It takes as input a 3D pointcloud
-featurized using 2D foundation models (CLIP, DINO). Subsequently, masked prediction in latent space
-is employed as a pretext task to aid the self-supervised learning of contextualized pointcloud features.
-Once trained, the 3D-JEPA encoder is finetuned alongside a language-conditioned decoder to jointly
-predict 3D masks and bounding boxes.
-## Models
-- **Locate-3D**: Locate-3D model trained on public referential grounding datasets
-- **Locate-3D+**: Locate-3D model trained on public referential grounding datasets and the newly released Locate 3D Dataset
-- **3D-JEPA**: Pre-trained SSL encoder for 3D understanding
-## How to Use
-For detailed instructions on how to load the encoder and integrate it into your downstream task, please refer to our [GitHub repository](https://github.com/facebookresearch/locate-3d).
-## License
-The majority of `locate-3` is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Pointcept is licensed under the MIT license.

 ---
 license: cc-by-nc-4.0
+tags:
+- pytorch_model_hub_mixin
+- model_hub_mixin
 ---
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
+- Library: [More Information Needed]
+- Docs: [More Information Needed]

config.json CHANGED Viewed

@@ -2,7 +2,22 @@
   "embed_dim": 256,
   "input_feat_dim": 1536,
   "num_rgb_harmonic_functions": 16,
-  "ptv3_args": {},
   "rgb_proj_dim": 256,
   "voxel_size": 0.05
 }

   "embed_dim": 256,
   "input_feat_dim": 1536,
   "num_rgb_harmonic_functions": 16,
+  "ptv3_args": {
+    "dec_channels": [
+      256,
+      256,
+      512,
+      512
+    ],
+    "enc_channels": [
+      256,
+      256,
+      512,
+      512,
+      1024
+    ],
+    "in_channels": 256
+  },
   "rgb_proj_dim": 256,
   "voxel_size": 0.05
 }