| datasets: | |
| - facebook/boxer | |
| license: cc-by-nc-4.0 | |
| pipeline_tag: object-detection | |
| tags: | |
| - 3d-object-detection | |
| - open-world-detection | |
| - 3d-vision | |
| # Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D | |
| [Project Page](https://facebookresearch.github.io/boxer) | [Paper](https://huggingface.co/papers/2604.05212) | [Code](https://github.com/facebookresearch/boxer) | |
| Boxer is an algorithm designed to estimate static 3D bounding boxes (3DBBs) from 2D open-vocabulary object detections, posed images, and optional depth data. At its core is **BoxerNet**, a transformer-based network which lifts 2D bounding box (2DBB) proposals into 3D, followed by multi-view fusion and geometric filtering to produce globally consistent de-duplicated 3DBBs in metric world space. | |
|  | |
| ## Installation | |
| We recommend using [uv](https://docs.astral.sh/uv/) to manage the environment: | |
| ```bash | |
| # Create virtual environment | |
| uv venv boxer --python 3.12 | |
| source boxer/bin/activate | |
| # Core dependencies for running Boxer | |
| uv pip install 'torch>=2.0' numpy opencv-python tqdm dill | |
| ``` | |
| ## Usage | |
| After installation and downloading the required checkpoints using the scripts provided in the repository, you can run BoxerNet on sample data. For example, to run BoxerNet in headless mode on a sample sequence: | |
| ```bash | |
| python run_boxer.py --input nym10_gen1 --max_n=90 --track | |
| ``` | |
| This will estimate 3D bounding boxes and save the results (CSV and visualization) to the `output/` directory. | |
| ## Citation | |
| If you find Boxer useful in your research, please consider citing: | |
| ```bibtex | |
| @article{boxer2026, | |
| title={Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D}, | |
| author={Daniel DeTone and Tianwei Shen and Fan Zhang and Lingni Ma and Julian Straub and Richard Newcombe and Jakob Engel}, | |
| year={2026}, | |
| } | |
| ``` |