PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

PixARMesh is a mesh-native autoregressive framework for single-view 3D scene reconstruction. Instead of reconstructing via intermediate volumetric or implicit representations, PixARMesh directly models instances with native mesh representation. Object poses and meshes are predicted in a unified autoregressive sequence.

Model Sources

Project Page: https://mlpc-ucsd.github.io/PixARMesh/
Repository: https://github.com/mlpc-ucsd/PixARMesh
Paper: https://arxiv.org/abs/2603.05888

Model Details

PixARMesh jointly predicts object layout and geometry within a unified model, producing coherent meshes in a single forward pass. Building on recent advances in mesh generative models, it augments a point-cloud encoder with pixel-aligned image features and global scene context via cross-attention, enabling accurate spatial reasoning from a single image. Scenes are generated autoregressively from a unified token stream containing context, pose, and mesh.

Citation

If you find PixARMesh useful in your research, please consider citing:

@article{zhang2026pixarmesh,
  title={PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction},
  author={Zhang, Xiang and Yoo, Sohyun and Wu, Hongrui and Li, Chuan and Xie, Jianwen and Tu, Zhuowen},
  journal={arXiv preprint arXiv:2603.05888},
  year={2026}
}