|
|
--- |
|
|
pipeline_tag: image-text-to-text |
|
|
license: cc-by-nc-4.0 |
|
|
--- |
|
|
|
|
|
# UniVLG: Unifying 2D and 3D Vision-Language Understanding |
|
|
|
|
|
This repository contains the UniVLG model, as presented in [Unifying 2D and 3D Vision-Language Understanding](https://arxiv.org/abs/2503.10745). UniVLG is a unified architecture for 2D and 3D vision-language understanding. |
|
|
|
|
|
Project page: https://univlg.github.io |
|
|
|
|
|
The model uses a custom loading tool (`uvx`). Checkpoints are available on Hugging Face: [Hugging Face](https://huggingface.co/katefgroup/UniVLG/tree/main). See the [GitHub repository](https://github.com/facebookresearch/univlg) for code and instructions. |
|
|
|
|
|
## Citation |
|
|
``` |
|
|
@article{jain2025unifying, |
|
|
title={Unifying 2D and 3D Vision-Language Understanding}, |
|
|
author={Jain, Ayush and Swerdlow, Alexander and Wang, Yuzhou and Arnaud, Sergio and Martin, Ada and Sax, Alexander and Meier, Franziska and Fragkiadaki, Katerina}, |
|
|
journal={arXiv preprint arXiv:2503.10745}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
**License Note:** The majority of UniVLG is licensed under CC-BY-NC, however, portions of the project (specifically Odin and Pointcept) are available under separate MIT license terms. Please refer to the GitHub repository for details. |