--- license: mit pipeline_tag: image-to-3d library_name: diffusers ---

UNICA: A Unified Neural Framework for Controllable 3D Avatars

Teaser

## Abstract Controllable 3D human avatars have found widespread applications in 3D games, the metaverse, and AR/VR scenarios. The conventional approach to creating such a 3D avatar requires a lengthy, intricate pipeline encompassing appearance modeling, motion planning, rigging, and physical simulation. In this paper, we introduce **UNICA** (**UNI**fied neural **C**ontrollable **A**vatar), a skeleton-free generative model that unifies all avatar control components into a single neural framework. Given keyboard inputs akin to video game controls, UNICA generates the next frame of a 3D avatar's geometry through an action-conditioned diffusion model operating on 2D position maps. A point transformer then maps the resulting geometry to 3D Gaussian Splatting for high-fidelity free-view rendering. Our approach naturally captures hair and loose clothing dynamics without manually designed physical simulation, and supports extra-long autoregressive generation. ## Resources - **Paper:** [UNICA: A Unified Neural Framework for Controllable 3D Avatars](https://huggingface.co/papers/2604.02799) - **GitHub Repository:** [https://github.com/zjh21/UNICA](https://github.com/zjh21/UNICA) ## Installation and Usage Please refer to the official [GitHub repository](https://github.com/zjh21/UNICA) for detailed installation instructions and inference scripts. The pipeline generally involves two stages: 1. **Geometry Generation:** Using the action-conditioned diffusion model to generate position maps. 2. **Appearance Mapping:** Mapping geometry to 3D Gaussian Splatting via a point transformer for rendering.