UNICA / README.md

Add pipeline tag, library name and link to paper (#1)

a27f3f5 about 13 hours ago

2.23 kB

	---
	license: mit
	pipeline_tag: image-to-3d
	library_name: diffusers
	---

	<h1 align="center">UNICA: A Unified Neural Framework for Controllable 3D Avatars</h1>

	<p align="center">
	<a href="https://github.com/zjh21/UNICA"><img src="https://img.shields.io/badge/GitHub-Code-blue?logo=github&logoColor=white" alt="GitHub"></a>
	<a href="https://huggingface.co/papers/2604.02799"><img src="https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
	</p>

	<p align="center">
	<img src="https://huggingface.co/zjh21/UNICA/resolve/main/assets/teaser.png" alt="Teaser" width="100%">
	</p>

	## Abstract

	Controllable 3D human avatars have found widespread applications in 3D games, the metaverse, and AR/VR scenarios. The conventional approach to creating such a 3D avatar requires a lengthy, intricate pipeline encompassing appearance modeling, motion planning, rigging, and physical simulation. In this paper, we introduce UNICA (UNIfied neural Controllable Avatar), a skeleton-free generative model that unifies all avatar control components into a single neural framework. Given keyboard inputs akin to video game controls, UNICA generates the next frame of a 3D avatar's geometry through an action-conditioned diffusion model operating on 2D position maps. A point transformer then maps the resulting geometry to 3D Gaussian Splatting for high-fidelity free-view rendering. Our approach naturally captures hair and loose clothing dynamics without manually designed physical simulation, and supports extra-long autoregressive generation.

	## Resources
	- Paper: [UNICA: A Unified Neural Framework for Controllable 3D Avatars](https://huggingface.co/papers/2604.02799)
	- GitHub Repository: [https://github.com/zjh21/UNICA](https://github.com/zjh21/UNICA)

	## Installation and Usage

	Please refer to the official [GitHub repository](https://github.com/zjh21/UNICA) for detailed installation instructions and inference scripts. The pipeline generally involves two stages:
	1. Geometry Generation: Using the action-conditioned diffusion model to generate position maps.
	2. Appearance Mapping: Mapping geometry to 3D Gaussian Splatting via a point transformer for rendering.