Abstract
TriSplat is a feed-forward 3D reconstruction network that uses oriented triangle primitives to directly generate simulation-ready meshes from single images, bypassing expensive post-processing steps.
Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise. This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations. We present TriSplat, a feed-forward reconstruction network that represents scenes with oriented triangle primitives and directly exports simulation-ready mesh scenes from a single forward pass. Given input images, the network predicts local 3D point maps, triangle attributes, camera poses, and optional intrinsics. Rather than regressing triangle orientation as an unconstrained latent variable, our approach constructs geometry normals from the predicted point maps, refines them with an image-conditioned normal head, and converts them into stable local frames for triangle parameterization. A mono-normal bootstrap schedule further stabilizes early training, while opacity and blur scheduling progressively sharpens the learned surface representation for direct mesh extraction. Experiments on RealEstate10K and DL3DV show that this representation produces more geometry-faithful reconstructions than Gaussian feed-forward baselines while maintaining competitive novel-view rendering quality. Because the rendering primitives are themselves surface triangles, the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion, making it a practical simulation-ready solution for feed-forward 3D scene reconstruction.
Community
the idea of anchoring triangle orientation to predicted local geometry, instead of letting it float as an unconstrained latent, is the heart of trisplat and it pays off. deriving stable local tangent frames from per-pixel point normals and then feeding an image-conditioned normal head with monocular teacher bootstrapping to sharpen those frames is a really clever stability hack for training. an ablation that isolates the monocular bootstrap, e.g., remove the teacher normals or swap in a fixed normal prior, would tell us how much that component actually drives geometry fidelity. the arxivlens breakdown helped me parse the method details and the shader-like triangle rasterization, and that walkthrough lines up nicely with the paper: https://arxivlens.com/PaperView/Details/trisplat-simulation-ready-feed-forward-3d-scene-reconstruction-7032-2fa55da0
Get this paper in your agent:
hf papers read 2605.26115 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 1
lhmd/re10k_torch
Spaces citing this paper 0
No Space linking this paper