File size: 17,018 Bytes
6084176 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | # SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation
## Check out [SMPLest-X](https://github.com/wqyin/SMPLest-X), an extension of SMPLer-X with stronger foundation models!

## Useful links
<div align="center">
<a href="https://caizhongang.github.io/projects/SMPLer-X/" class="button"><b>[Homepage]</b></a>
<a href="https://huggingface.co/spaces/caizhongang/SMPLer-X" class="button"><b>[HuggingFace]</b></a>
<a href="https://arxiv.org/abs/2309.17448" class="button"><b>[arXiv]</b></a>
<a href="https://youtu.be/DepTqbPpVzY" class="button"><b>[Video]</b></a>
<a href="https://github.com/wqyin/SMPLest-X" class="button"><b>[SMPLest-X]</b></a>
<a href="https://github.com/open-mmlab/mmhuman3d" class="button"><b>[MMHuman3D]</b></a>
</div>
## News
- [2025-10-21] [SMPLest-X](https://github.com/wqyin/SMPLest-X) accepted to TPAMI.
- [2025-02-17] Pretrained models of [SMPLest-X](https://github.com/wqyin/SMPLest-X) available for download.
- [2025-02-14] Brand new codebase of [SMPLest-X](https://github.com/wqyin/SMPLest-X) released for trainig, testing and inference.
- [2025-01-20] [SMPLest-X](https://github.com/wqyin/SMPLest-X) released on [arXiv](https://arxiv.org/abs/2501.09782).
- [2025-01-08] Project page of [SMPLest-X](https://github.com/wqyin/SMPLest-X) created.
- [2024-03-29] An updated version of SMPLer-X-H32 is released to fix camera estimation on 3DPW-like data.
- [2024-02-29] [HuggingFace](https://huggingface.co/spaces/caizhongang/SMPLer-X) demo is online!
- [2023-10-23] Support visualization through SMPL-X mesh overlay and add inference docker.
- [2023-10-02] [arXiv](https://arxiv.org/abs/2309.17448) preprint is online!
- [2023-09-28] [Homepage](https://caizhongang.github.io/projects/SMPLer-X/) and [Video](https://youtu.be/DepTqbPpVzY) are online!
- [2023-07-19] Pretrained models are released.
- [2023-06-15] Training and testing code is released.
## Gallery
|  |  |  |
|:--------------------------------------:|:-----------------------------:|:-----------------------------:|
|  |  |  |

## Install
```bash
conda create -n smplerx python=3.8 -y
conda activate smplerx
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12.0/index.html
pip install -r requirements.txt
# install mmpose
cd main/transformer_utils
pip install -v -e .
cd ../..
```
## Docker Support (Early Stage)
```
docker pull wcwcw/smplerx_inference:v0.2
docker run --gpus all -v <vid_input_folder>:/smplerx_inference/vid_input \
-v <vid_output_folder>:/smplerx_inference/vid_output \
wcwcw/smplerx_inference:v0.2 --vid <video_name>.mp4
# Currently any customization need to be applied to /smplerx_inference/smplerx/inference_docker.py
```
- We recently developed a docker for inference at docker hub.
- This docker image uses SMPLer-X-H32 as inference baseline and was tested at RTX3090 & WSL2 (Ubuntu 20.04).
## Pretrained Models
| Model | Backbone | #Datasets | #Inst. | #Params | MPE | Download | FPS |
|:-------------:|:--------:|:---------:|:------:|:-------:|:----:|:--------:|:-----:|
| SMPLer-X-S32 | ViT-S | 32 | 4.5M | 32M | 82.6 | [model](https://huggingface.co/caizhongang/SMPLer-X/resolve/main/smpler_x_s32.pth.tar?download=true) | 36.17 |
| SMPLer-X-B32 | ViT-B | 32 | 4.5M | 103M | 74.3 | [model](https://huggingface.co/caizhongang/SMPLer-X/resolve/main/smpler_x_b32.pth.tar?download=true) | 33.09 |
| SMPLer-X-L32 | ViT-L | 32 | 4.5M | 327M | 66.2 | [model](https://huggingface.co/caizhongang/SMPLer-X/resolve/main/smpler_x_l32.pth.tar?download=true) | 24.44 |
| SMPLer-X-H32 | ViT-H | 32 | 4.5M | 662M | 63.0 | [model](https://huggingface.co/caizhongang/SMPLer-X/resolve/main/smpler_x_h32.pth.tar?download=true) | 17.47 |
| SMPLer-X-H32* | ViT-H | 32 | 4.5M | 662M | 59.7 | [model](https://huggingface.co/caizhongang/SMPLer-X/resolve/main/smpler_x_h32_correct.pth.tar?download=true) | 17.47 |
* MPE (Mean Primary Error): the average of the primary errors on five benchmarks (AGORA, EgoBody, UBody, 3DPW, and EHF)
* FPS (Frames Per Second): the average inference speed on a single Tesla V100 GPU, batch size = 1
* SMPLer-X-H32* is the updated version of SMPLer-X-H32, which fixes the camera estimation issue on 3DPW-like data.
## Preparation
- download all datasets
- [3DPW](https://virtualhumans.mpi-inf.mpg.de/3DPW/)
- [AGORA](https://agora.is.tue.mpg.de/index.html)
- [ARCTIC](https://arctic.is.tue.mpg.de/)
- [BEDLAM](https://bedlam.is.tue.mpg.de/index.html)
- [BEHAVE](https://github.com/xiexh20/behave-dataset)
- [CHI3D](https://ci3d.imar.ro/)
- [CrowdPose](https://github.com/Jeff-sjtu/CrowdPose)
- [EgoBody](https://sanweiliti.github.io/egobody/egobody.html)
- [EHF](https://smpl-x.is.tue.mpg.de/index.html)
- [FIT3D](https://fit3d.imar.ro/)
- [GTA-Human](https://caizhongang.github.io/projects/GTA-Human/)
- [Human3.6M](http://vision.imar.ro/human3.6m/description.php)
- [HumanSC3D](https://sc3d.imar.ro/)
- [InstaVariety](https://github.com/akanazawa/human_dynamics/blob/master/doc/insta_variety.md)
- [LSPET](http://sam.johnson.io/research/lspet.html)
- [MPII](http://human-pose.mpi-inf.mpg.de/)
- [MPI-INF-3DHP](https://vcai.mpi-inf.mpg.de/3dhp-dataset/)
- [MSCOCO](https://cocodataset.org/#home)
- [MTP](https://tuch.is.tue.mpg.de/)
- [MuCo-3DHP](https://vcai.mpi-inf.mpg.de/projects/SingleShotMultiPerson/)
- [OCHuman](https://github.com/liruilong940607/OCHumanApi)
- [PoseTrack](https://posetrack.net/)
- [PROX](https://prox.is.tue.mpg.de/)
- [RenBody](https://magichub.com/datasets/openxd-renbody/)
- [RICH](https://rich.is.tue.mpg.de/index.html)
- [SPEC](https://spec.is.tue.mpg.de/index.html)
- [SSP3D](https://github.com/akashsengupta1997/SSP-3D)
- [SynBody](https://maoxie.github.io/SynBody/)
- [Talkshow](https://talkshow.is.tue.mpg.de/)
- [UBody](https://github.com/IDEA-Research/OSX)
- [UP3D](https://files.is.tuebingen.mpg.de/classner/up/)
- process all datasets into [HumanData](https://github.com/open-mmlab/mmhuman3d/blob/main/docs/human_data.md) format, except the following:
- AGORA, MSCOCO, MPII, Human3.6M, UBody.
- follow [OSX](https://github.com/IDEA-Research/OSX) in preparing these 5 datasets.
- follow [OSX](https://github.com/IDEA-Research/OSX) in preparing pretrained ViTPose models. Download the ViTPose pretrained weights for ViT-small and ViT-huge from [here](https://github.com/ViTAE-Transformer/ViTPose).
- download [SMPL-X](https://smpl-x.is.tue.mpg.de/) and [SMPL](https://smpl.is.tue.mpg.de/) body models.
- download mmdet pretrained [model](https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth) and [config](https://github.com/openxrlab/xrmocap/blob/main/configs/modules/human_perception/mmdet_faster_rcnn_r50_fpn_coco.py) for inference.
The file structure should be like:
```
SMPLer-X/
βββ common/
β βββ utils/
β βββ human_model_files/ # body model
β βββ smpl/
β β βββSMPL_NEUTRAL.pkl
β β βββSMPL_MALE.pkl
β β βββSMPL_FEMALE.pkl
β βββ smplx/
β βββMANO_SMPLX_vertex_ids.pkl
β βββSMPL-X__FLAME_vertex_ids.npy
β βββSMPLX_NEUTRAL.pkl
β βββSMPLX_to_J14.pkl
β βββSMPLX_NEUTRAL.npz
β βββSMPLX_MALE.npz
β βββSMPLX_FEMALE.npz
βββ data/
βββ main/
βββ demo/
β βββ videos/
β βββ images/
β βββ results/
βββ pretrained_models/ # pretrained ViT-Pose, SMPLer_X and mmdet models
β βββ mmdet/
β β βββfaster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
β β βββmmdet_faster_rcnn_r50_fpn_coco.py
β βββ smpler_x_s32.pth.tar
β βββ smpler_x_b32.pth.tar
β βββ smpler_x_l32.pth.tar
β βββ smpler_x_h32.pth.tar
β βββ vitpose_small.pth
β βββ vitpose_base.pth
β βββ vitpose_large.pth
β βββ vitpose_huge.pth
βββ dataset/
βββ AGORA/
βββ ARCTIC/
βββ BEDLAM/
βββ Behave/
βββ CHI3D/
βββ CrowdPose/
βββ EgoBody/
βββ EHF/
βββ FIT3D/
βββ GTA_Human2/
βββ Human36M/
βββ HumanSC3D/
βββ InstaVariety/
βββ LSPET/
βββ MPII/
βββ MPI_INF_3DHP/
βββ MSCOCO/
βββ MTP/
βββ MuCo/
βββ OCHuman/
βββ PoseTrack/
βββ PROX/
βββ PW3D/
βββ RenBody/
βββ RICH/
βββ SPEC/
βββ SSP3D/
βββ SynBody/
βββ Talkshow/
βββ UBody/
βββ UP3D/
βββ preprocessed_datasets/ # HumanData files
```
## Inference
- Place the video for inference under `SMPLer-X/demo/videos`
- Prepare the pretrained models to be used for inference under `SMPLer-X/pretrained_models`
- Prepare the mmdet pretrained model and config under `SMPLer-X/pretrained_models`
- Inference output will be saved in `SMPLer-X/demo/results`
```bash
cd main
sh slurm_inference.sh {VIDEO_FILE} {FORMAT} {FPS} {PRETRAINED_CKPT}
# For inferencing test_video.mp4 (24FPS) with smpler_x_h32
sh slurm_inference.sh test_video mp4 24 smpler_x_h32
```
## 2D Smplx Overlay
We provide a lightweight visualization script for mesh overlay based on pyrender.
- Use ffmpeg to split video into images
- The visualization script takes inference results (see above) as the input.
```bash
ffmpeg -i {VIDEO_FILE} -f image2 -vf fps=30 \
{SMPLERX INFERENCE DIR}/{VIDEO NAME (no extension)}/orig_img/%06d.jpg \
-hide_banner -loglevel error
cd main && python render.py \
--data_path {SMPLERX INFERENCE DIR} --seq {VIDEO NAME} \
--image_path {SMPLERX INFERENCE DIR}/{VIDEO NAME} \
--render_biggest_person False
```
## Training
```bash
cd main
sh slurm_train.sh {JOB_NAME} {NUM_GPU} {CONFIG_FILE}
# For training SMPLer-X-H32 with 16 GPUS
sh slurm_train.sh smpler_x_h32 16 config_smpler_x_h32.py
```
- CONFIG_FILE is the file name under `SMPLer-X/main/config`
- Logs and checkpoints will be saved to `SMPLer-X/output/train_{JOB_NAME}_{DATE_TIME}`
## Testing
```bash
# To eval the model ../output/{TRAIN_OUTPUT_DIR}/model_dump/snapshot_{CKPT_ID}.pth.tar
# with confing ../output/{TRAIN_OUTPUT_DIR}/code/config_base.py
cd main
sh slurm_test.sh {JOB_NAME} {NUM_GPU} {TRAIN_OUTPUT_DIR} {CKPT_ID}
```
- NUM_GPU = 1 is recommended for testing
- Logs and results will be saved to `SMPLer-X/output/test_{JOB_NAME}_ep{CKPT_ID}_{TEST_DATSET}`
## FAQ
- `RuntimeError: Subtraction, the '-' operator, with a bool tensor is not supported. If you are trying to invert a mask, use the '~' or 'logical_not()' operator instead.`
Follow [this post](https://github.com/mks0601/I2L-MeshNet_RELEASE/issues/6#issuecomment-675152527) and modify `torchgeometry`
- `KeyError: 'SinePositionalEncoding is already registered in position encoding'` or any other similar KeyErrors due to duplicate module registration.
Manually add `force=True` to respective module registration under `main/transformer_utils/mmpose/models/utils`, e.g. `@POSITIONAL_ENCODING.register_module(force=True)` in [this file](main/transformer_utils/mmpose/models/utils/positional_encoding.py)
- How do I animate my virtual characters with SMPLer-X output (like that in the demo video)?
- We are working on that, please stay tuned!
Currently, this repo supports SMPL-X estimation and a simple visualization (overlay of SMPL-X vertices).
## References
- [Hand4Whole](https://github.com/mks0601/Hand4Whole_RELEASE)
- [OSX](https://github.com/IDEA-Research/OSX)
- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d)
## Citation
```text
# SMPLest-X
@article{yin2025smplest,
title={SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation},
author={Yin, Wanqi and Cai, Zhongang and Wang, Ruisi and Zeng, Ailing and Wei, Chen and Sun, Qingping and Mei, Haiyi and Wang, Yanjun and Pang, Hui En and Zhang, Mingyuan and Zhang, Lei and Loy, Chen Change and Yamashita, Atsushi and Yang, Lei and Liu, Ziwei},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2026},
volume={48},
number={2},
pages={1778-1794},
doi={10.1109/TPAMI.2025.3618174}
}
# SMPLer-X
@inproceedings{cai2023smplerx,
title={{SMPLer-X}: Scaling up expressive human pose and shape estimation},
author={Cai, Zhongang and Yin, Wanqi and Zeng, Ailing and Wei, Chen and Sun, Qingping and Yanjun, Wang and Pang, Hui En and Mei, Haiyi and Zhang, Mingyuan and Zhang, Lei and Loy, Chen Change and Yang, Lei and Liu, Ziwei},
booktitle={Advances in Neural Information Processing Systems},
year={2023}
}
```
## Explore More [Motrix](https://github.com/MotrixLab) Projects
### Motion Capture
- [SMPL-X] [TPAMI'25] [SMPLest-X](https://github.com/MotrixLab/SMPLest-X): An extended version of [SMPLer-X](https://github.com/MotrixLab/SMPLer-X) with stronger foundation models.
- [SMPL-X] [NeurIPS'23] [SMPLer-X](https://github.com/MotrixLab/SMPLer-X): Scaling up EHPS towards a family of generalist foundation models.
- [SMPL-X] [ECCV'24] [WHAC](https://github.com/MotrixLab/WHAC): World-grounded human pose and camera estimation from monocular videos.
- [SMPL-X] [CVPR'24] [AiOS](https://github.com/MotrixLab/AiOS): An all-in-one-stage pipeline combining detection and 3D human reconstruction.
- [SMPL-X] [NeurIPS'23] [RoboSMPLX](https://github.com/MotrixLab/RoboSMPLX): A framework to enhance the robustness of whole-body pose and shape estimation.
- [SMPL-X] [ICML'25] [ADHMR](https://github.com/MotrixLab/ADHMR): A framework to align diffusion-based human mesh recovery methods via direct preference optimization.
- [SMPL-X] [MKA](https://github.com/MotrixLab/MKA): Full-body 3D mesh reconstruction from single- or multi-view RGB videos.
- [SMPL] [ICCV'23] [Zolly](https://github.com/MotrixLab/Zolly): 3D human mesh reconstruction from perspective-distorted images.
- [SMPL] [IJCV'26] [PointHPS](https://github.com/MotrixLab/PointHPS): 3D HPS from point clouds captured in real-world settings.
- [SMPL] [NeurIPS'22] [HMR-Benchmarks](https://github.com/MotrixLab/hmr-benchmarks): A comprehensive benchmark of HPS datasets, backbones, and training strategies.
### Motion Generation
- [SMPL-X] [ICLR'26] [ViMoGen](https://github.com/MotrixLab/ViMoGen): A comprehensive framework that transfers knowledge from ViGen to MoGen across data, modeling, and evaluation.
- [SMPL-X] [ECCV'24] [LMM](https://github.com/MotrixLab/LMM): Large Motion Model for Unified Multi-Modal Motion Generation.
- [SMPL-X] [NeurIPS'23] [FineMoGen](https://github.com/MotrixLab/FineMoGen): Fine-Grained Spatio-Temporal Motion Generation and Editing.
- [SMPL] [InfiniteDance](https://github.com/MotrixLab/InfiniteDance): A large-scale 3D dance dataset and an MLLM-based music-to-dance model designed for robust in-the-wild generalization.
- [SMPL] [NeurIPS'23] [InsActor](https://github.com/MotrixLab/insactor): Generating physics-based human motions from language and waypoint conditions via diffusion policies.
- [SMPL] [ICCV'23] [ReMoDiffuse](https://github.com/MotrixLab/ReMoDiffuse): Retrieval-Augmented Motion Diffusion Model.
- [SMPL] [TPAMI'24] [MotionDiffuse](https://github.com/MotrixLab/MotionDiffuse): Text-Driven Human Motion Generation with Diffusion Model.
### Motion Dataset
- [SMPL] [ECCV'22] [HuMMan](https://github.com/MotrixLab/humman_toolbox): Toolbox for HuMMan, a large-scale multi-modal 4D human dataset.
- [SMPLX] [T-PAMI'24] [GTA-Human](https://github.com/MotrixLab/gta-human_toolbox): Toolbox for GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine.
|