facebook
/

locate-3d-plus

Model card Files Files and versions

locate-3d-plus / README.md

pmcvay's picture

Update README.md

3a0a765 verified 9 months ago

|

history blame contribute delete

1.81 kB

	---
	license: cc-by-nc-4.0
	---
	# Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

	Official model weights for the `Locate-3D` models and the `3D-JEPA` encoders

	## Locate 3D

	Locate 3D is a model for localizing objects in 3D scenes from referring expressions like “the
	small coffee table between the sofa and the lamp.” Locate 3D sets a new state-of-the-art on standard
	referential grounding benchmarks and showcases robust generalization capabilities. Notably, Locate
	3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world
	deployment on robots and AR devices.

	## 3D-JEPA

	3D-JEPA, a novel self-supervised
	learning (SSL) algorithm applicable to sensor point clouds, is key to `Locate 3D`. It takes as input a 3D pointcloud
	featurized using 2D foundation models (CLIP, DINO). Subsequently, masked prediction in latent space
	is employed as a pretext task to aid the self-supervised learning of contextualized pointcloud features.
	Once trained, the 3D-JEPA encoder is finetuned alongside a language-conditioned decoder to jointly
	predict 3D masks and bounding boxes.

	## Models

	- Locate-3D: Locate-3D model trained on public referential grounding datasets
	- Locate-3D+: Locate-3D model trained on public referential grounding datasets and the newly released Locate 3D Dataset
	- 3D-JEPA: Pre-trained SSL encoder for 3D understanding


	## How to Use

	For detailed instructions on how to load the encoder and integrate it into your downstream task, please refer to our [GitHub repository](https://github.com/facebookresearch/locate-3d).

	## License

	The majority of `locate-3` is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Pointcept is licensed under the MIT license.