SyncHuman / README.md

nielsr HF Staff

Improve model card: Add metadata, paper & code links, abstract, and sample usage

5cb467b verified 6 months ago

4.41 kB

base_model:
  - microsoft/TRELLIS-image-large
license: mit
pipeline_tag: image-to-3d
library_name: diffusers

SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction

This model is presented in the paper SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction.

[Paper] - [Project Page] - [Code]

Abstract

Photorealistic 3D full-body human reconstruction from a single image is a critical yet challenging task for applications in films and video games due to inherent ambiguities and severe self-occlusions. While recent approaches leverage SMPL estimation and SMPL-conditioned image generative models to hallucinate novel views, they suffer from inaccurate 3D priors estimated from SMPL meshes and have difficulty in handling difficult human poses and reconstructing fine details. In this paper, we propose SyncHuman, a novel framework that combines 2D multiview generative model and 3D native generative model for the first time, enabling high-quality clothed human mesh reconstruction from single-view images even under challenging human poses. Multiview generative model excels at capturing fine 2D details but struggles with structural consistency, whereas 3D native generative model generates coarse yet structurally consistent 3D shapes. By integrating the complementary strengths of these two approaches, we develop a more effective generation framework. Specifically, we first jointly fine-tune the multiview generative model and the 3D native generative model with proposed pixel-aligned 2D-3D synchronization attention to produce geometrically aligned 3D shapes and 2D multiview images. To further improve details, we introduce a feature injection mechanism that lifts fine details from 2D multiview images onto the aligned 3D shapes, enabling accurate and high-fidelity reconstruction. Extensive experiments demonstrate that SyncHuman achieves robust and photo-realistic 3D human reconstruction, even for images with challenging poses. Our method outperforms baseline methods in geometric accuracy and visual fidelity, demonstrating a promising direction for future 3D generation models.

Environment Setup

We tested on H800 with CUDA 12.1. Follow the steps below to set up the environment.

1) Create Conda env and install PyTorch (CUDA 12.1)

conda create -n SyncHuman python=3.10
conda activate SyncHuman

# PyTorch 2.1.1 + CUDA 12.1
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=12.1 -c pytorch -c nvidia

2) Follow trellis to setup the env

3) Install remaining Python packages

pip install accelerate safetensors==0.4.5 diffusers==0.29.1 transformers==4.36.0

Inference

git clone https://github.com/xishuxishu/SyncHuman.git

1) download ckpts

cd SyncHuman
python download.py

The file organization structure is shown below：

SyncHuman
├── ckpts
│   ├── OneStage
│   └── SecondStage
├── SyncHuman
├── examples
├── inference_OneStage.py
├── inference_SecondStage.py
└── download.py

2) run the inference code

python inference_OneStage.py

python inference_SecondStage.py

If you want to change the example image used for inference, please modify the image_path in inference_OneStage.py.

Then you will get the final generated result at outputs/SecondStage/output.glb.

Acknowledgement

Our code is based on these wonderful works:

TRELLIS
PSHuman

Citation

If you find this work useful, please cite our paper:

@article{chen2025synchuman,
  title={SyncHuman: Synchronizing 2D and 3D Diffusion Models for Single-view Human Reconstruction},
  author={Wenyue Chen, Peng Li, Wangguandong Zheng, Chengfeng Zhao, Mengfei Li, Yaolong Zhu, Zhiyang Dou, Ronggang Wang, Yuan Liu},
  journal={arXiv preprint arXiv:2510.07723},
  year={2025}
}