Abstract
Fast-SAM3D addresses slow inference in 3D reconstruction by dynamically adapting computation to varying complexity through heterogeneity-aware mechanisms that improve efficiency without sacrificing quality.
SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency. In this work, we conduct the first systematic investigation into its inference dynamics, revealing that generic acceleration strategies are brittle in this context. We demonstrate that these failures stem from neglecting the pipeline's inherent multi-level heterogeneity: the kinematic distinctiveness between shape and layout, the intrinsic sparsity of texture refinement, and the spectral variance across geometries. To address this, we present Fast-SAM3D, a training-free framework that dynamically aligns computation with instantaneous generation complexity. Our approach integrates three heterogeneity-aware mechanisms: (1) Modality-Aware Step Caching to decouple structural evolution from sensitive layout updates; (2) Joint Spatiotemporal Token Carving to concentrate refinement on high-entropy regions; and (3) Spectral-Aware Token Aggregation to adapt decoding resolution. Extensive experiments demonstrate that Fast-SAM3D delivers up to 2.67times end-to-end speedup with negligible fidelity loss, establishing a new Pareto frontier for efficient single-view 3D generation. Our code is released in https://github.com/wlfeng0509/Fast-SAM3D.
Community
arXivLens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/fast-sam3d-3dfy-anything-in-images-but-faster-7451-43eb5231
- Executive Summary
- Detailed Breakdown
- Practical Applications
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
