Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation
Abstract
Arbor enables explicit 3D spatial control in text-conditioned latent generation through constraint meshes that define occupancy, avoidance, and contact regions, maintaining object quality while improving constraint adherence.
Text and image conditioned 3D models now generate convincing assets, but they still offer little direct control over the space an object should occupy or avoid. In authoring, this spatial intent is often known before generation starts. A chair should fit a seating envelope, a prop should leave clearance for motion, or a part should expose a contact surface. Prompts and image views are poor carriers for such constraints, requiring the need for an explicit control interface. We present Arbor, a trainable attachment for text conditioned latent 3D generation. Arbor introduces constraint meshes as a native 3D control interface. The interface uses hull regions where geometry should exist, avoidance regions that should remain empty, and touch regions the object should contact. Unlike completion or whole object scaffold control, these meshes are not target evidence. They are local typed requirements and can include regions where no surface should appear. Arbor keeps this signal as geometry by converting constraint meshes into tokens and learning a routed attachment inside a frozen denoiser. Each latent region can therefore receive the part of the constraint that matters for its spatial location. We evaluate Arbor on automatic and artist curated control benchmarks with hull, avoidance, and touch constraints, and compare the metric trends to a user preference study. Even without dedicated compliance losses, Arbor improves constraint obedience while preserving object quality and variation under fixed constraints.
Community
Hey everyone, today we share our newest work:
Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation
Current 3D generation methods can create 3D objects from text prompts, but this often behaves like a slot machine. You ask for an object, but you do not know whether it will satisfy the spatial requirements needed for production. For movies, games, animation, or asset design, this is a problem: an object may need to fit a fixed envelope, leave space for motion, or touch a specific surface.
Arbor addresses this by adding explicit geometry constraints to text-to-3D generation. Users provide constraint meshes that mark:
- HULL regions where geometry should exist
- AVOID regions that should remain empty
- TOUCH regions the object should contact
The method builds on the TRELLIS family. It keeps the text generator and geometry encoders frozen, turns the constraint meshes into compact geometry tokens, and routes local constraint evidence into the generator.
Get this paper in your agent:
hf papers read 2606.23514 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper