| --- |
| datasets: |
| - facebook/boxer |
| license: cc-by-nc-4.0 |
| pipeline_tag: object-detection |
| tags: |
| - 3d-object-detection |
| - open-world-detection |
| - 3d-vision |
| --- |
| |
| # Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D |
|
|
| [Project Page](https://facebookresearch.github.io/boxer) | [Paper](https://huggingface.co/papers/2604.05212) | [Code](https://github.com/facebookresearch/boxer) |
|
|
| Boxer is an algorithm designed to estimate static 3D bounding boxes (3DBBs) from 2D open-vocabulary object detections, posed images, and optional depth data. At its core is **BoxerNet**, a transformer-based network which lifts 2D bounding box (2DBB) proposals into 3D, followed by multi-view fusion and geometric filtering to produce globally consistent de-duplicated 3DBBs in metric world space. |
|
|
|  |
|
|
| ## Installation |
|
|
| We recommend using [uv](https://docs.astral.sh/uv/) to manage the environment: |
|
|
| ```bash |
| # Create virtual environment |
| uv venv boxer --python 3.12 |
| source boxer/bin/activate |
| |
| # Core dependencies for running Boxer |
| uv pip install 'torch>=2.0' numpy opencv-python tqdm dill |
| ``` |
|
|
| ## Usage |
|
|
| After installation and downloading the required checkpoints using the scripts provided in the repository, you can run BoxerNet on sample data. For example, to run BoxerNet in headless mode on a sample sequence: |
|
|
| ```bash |
| python run_boxer.py --input nym10_gen1 --max_n=90 --track |
| ``` |
|
|
| This will estimate 3D bounding boxes and save the results (CSV and visualization) to the `output/` directory. |
|
|
| ## Citation |
|
|
| If you find Boxer useful in your research, please consider citing: |
|
|
| ```bibtex |
| @article{boxer2026, |
| title={Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D}, |
| author={Daniel DeTone and Tianwei Shen and Fan Zhang and Lingni Ma and Julian Straub and Richard Newcombe and Jakob Engel}, |
| year={2026}, |
| } |
| ``` |