Nebularer commited on Mar 12

Commit

8af32c5

verified ·

1 Parent(s): c7b32c7

Add files using upload-large-folder tool

Browse files

Files changed (50) hide show

Depth-Anything-V2/DA-2K.md +51 -0
Depth-Anything-V2/LICENSE +201 -0
Depth-Anything-V2/README.md +201 -0
Depth-Anything-V2/app.py +88 -0
Depth-Anything-V2/requirements.txt +9 -0
Depth-Anything-V2/run.py +73 -0
Depth-Anything-V2/run_video.py +92 -0
SMPLest-X/.DS_Store +0 -0
SMPLest-X/.gitignore +8 -0
SMPLest-X/LICENSE.txt +9 -0
SMPLest-X/README.md +152 -0
SMPLest-X/datasets/SynHand.py +39 -0
SMPLest-X/datasets/dataset.py +103 -0
SMPLest-X/datasets/humandata.py +1076 -0
SMPLest-X/humandata_prep/README.md +64 -0
SMPLest-X/humandata_prep/check.py +298 -0
SMPLest-X/main/__init__.py +0 -0
SMPLest-X/main/base.py +234 -0
SMPLest-X/main/config.py +101 -0
SMPLest-X/main/constants.py +37 -0
SMPLest-X/main/inference.py +188 -0
SMPLest-X/main/test.py +107 -0
SMPLest-X/main/train.py +138 -0
SMPLest-X/requirements.txt +13 -0
SMPLest-X/requirements_py310.txt +14 -0
SMPLest-X/utils/distribute_utils.py +171 -0
SMPLest-X/utils/timer.py +31 -0
SMPLest-X/utils/transforms.py +366 -0
WiLoR/.DS_Store +0 -0
WiLoR/README.md +93 -0
WiLoR/demo.py +139 -0
WiLoR/demo.sh +2 -0
WiLoR/download_videos.py +58 -0
WiLoR/gradio_demo.py +192 -0
WiLoR/license.txt +402 -0
WiLoR/requirements.txt +20 -0
WiLoR/requirements_my.txt +11 -0
__init__.py +11 -0
convert_img_to_videos.py +90 -0
corrupted_videos.log +7 -0
corrupted_videos_csl_news.log +7 -0
extract_smplx_20260212_165824.log +0 -0
extract_smplx_20260212_165911_gpu_monitor.log +0 -0
extract_smplx_20260213_144424.log +0 -0
extract_smplx_20260213_144424_gpu_monitor.log +0 -0
extract_smplx_pose.py +657 -0
extract_smplx_pose.sh +27 -0
log/extract_smplx_20260211_195012.log +0 -0
log/extract_smplx_20260212_034356.log +0 -0
pretrained_weight/.DS_Store +0 -0

Depth-Anything-V2/DA-2K.md ADDED Viewed

	@@ -0,0 +1,51 @@

+# DA-2K Evaluation Benchmark
+## Introduction
+![DA-2K](assets/DA-2K.png)
+DA-2K is proposed in [Depth Anything V2](https://depth-anything-v2.github.io) to evaluate the relative depth estimation capability. It encompasses eight representative scenarios of `indoor`, `outdoor`, `non_real`, `transparent_reflective`, `adverse_style`, `aerial`, `underwater`, and `object`. It consists of 1K diverse high-quality images and 2K precise pair-wise relative depth annotations.
+Please refer to our [paper](https://arxiv.org/abs/2406.09414) for details in constructing this benchmark.
+## Usage
+Please first [download the benchmark](https://huggingface.co/datasets/depth-anything/DA-2K/tree/main).
+All annotations are stored in `annotations.json`. The annotation file is a JSON object where each key is the path to an image file, and the value is a list of annotations associated with that image. Each annotation describes two points and identifies which point is closer to the camera. The structure is detailed below:
+```
+{
+  "image_path": [
+    {
+      "point1": [h1, w1], # (vertical position, horizontal position)
+      "point2": [h2, w2], # (vertical position, horizontal position)
+      "closer_point": "point1" # we always set "point1" as the closer one
+    },
+    ...
+  ],
+  ...
+}
+```
+To visualize the annotations:
+```bash
+python visualize.py [--scene-type <type>]
+```
+**Options**
+- `--scene-type <type>` (optional): Specify the scene type (`indoor`, `outdoor`, `non_real`, `transparent_reflective`, `adverse_style`, `aerial`, `underwater`, and `object`). Skip this argument or set <type> as `""` to include all scene types.
+## Citation
+If you find this benchmark useful, please consider citing:
+```bibtex
+@article{depth_anything_v2,
+  title={Depth Anything V2},
+  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
+  journal={arXiv:2406.09414},
+  year={2024}
+}
+```

Depth-Anything-V2/LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

Depth-Anything-V2/README.md ADDED Viewed

	@@ -0,0 +1,201 @@

+<div align="center">
+<h1>Depth Anything V2</h1>
+[**Lihe Yang**](https://liheyoung.github.io/)<sup>1</sup> · [**Bingyi Kang**](https://bingykang.github.io/)<sup>2&dagger;</sup> · [**Zilong Huang**](http://speedinghzl.github.io/)<sup>2</sup>
+<br>
+[**Zhen Zhao**](http://zhaozhen.me/) · [**Xiaogang Xu**](https://xiaogang00.github.io/) · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)<sup>2</sup> · [**Hengshuang Zhao**](https://hszhao.github.io/)<sup>1*</sup>
+<sup>1</sup>HKU&emsp;&emsp;&emsp;<sup>2</sup>TikTok
+<br>
+&dagger;project lead&emsp;*corresponding author
+<a href="https://arxiv.org/abs/2406.09414"><img src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'></a>
+<a href='https://depth-anything-v2.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'></a>
+<a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a>
+<a href='https://huggingface.co/datasets/depth-anything/DA-2K'><img src='https://img.shields.io/badge/Benchmark-DA--2K-yellow' alt='Benchmark'></a>
+</div>
+This work presents Depth Anything V2. It significantly outperforms [V1](https://github.com/LiheYoung/Depth-Anything) in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy.
+![teaser](assets/teaser.png)
+## News
+- **2025-01-22:** [Video Depth Anything](https://videodepthanything.github.io) has been released. It generates consistent depth maps for super-long videos (e.g., over 5 minutes).
+- **2024-12-22:** [Prompt Depth Anything](https://promptda.github.io/) has been released. It supports 4K resolution metric depth estimation when low-res LiDAR is used to prompt the DA models.
+- **2024-07-06:** Depth Anything V2 is supported in [Transformers](https://github.com/huggingface/transformers/). See the [instructions](https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2) for convenient usage.
+- **2024-06-25:** Depth Anything is integrated into [Apple Core ML Models](https://developer.apple.com/machine-learning/models/). See the instructions ([V1](https://huggingface.co/apple/coreml-depth-anything-small), [V2](https://huggingface.co/apple/coreml-depth-anything-v2-small)) for usage.
+- **2024-06-22:** We release [smaller metric depth models](https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth#pre-trained-models) based on Depth-Anything-V2-Small and Base.
+- **2024-06-20:** Our repository and project page are flagged by GitHub and removed from the public for 6 days. Sorry for the inconvenience.
+- **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released.
+## Pre-trained Models
+We provide **four models** of varying scales for robust relative depth estimation:
+| Model | Params | Checkpoint |
+|:-|-:|:-:|
+| Depth-Anything-V2-Small | 24.8M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) |
+| Depth-Anything-V2-Base | 97.5M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) |
+| Depth-Anything-V2-Large | 335.3M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) |
+| Depth-Anything-V2-Giant | 1.3B | Coming soon |
+## Usage
+### Prepraration
+```bash
+git clone https://github.com/DepthAnything/Depth-Anything-V2
+cd Depth-Anything-V2
+pip install -r requirements.txt
+```
+Download the checkpoints listed [here](#pre-trained-models) and put them under the `checkpoints` directory.
+### Use our models
+```python
+import cv2
+import torch
+from depth_anything_v2.dpt import DepthAnythingV2
+DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
+model_configs = {
+    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
+    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
+    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
+    'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
+}
+encoder = 'vitl' # or 'vits', 'vitb', 'vitg'
+model = DepthAnythingV2(**model_configs[encoder])
+model.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location='cpu'))
+model = model.to(DEVICE).eval()
+raw_img = cv2.imread('your/image/path')
+depth = model.infer_image(raw_img) # HxW raw depth map in numpy
+```
+If you do not want to clone this repository, you can also load our models through [Transformers](https://github.com/huggingface/transformers/). Below is a simple code snippet. Please refer to the [official page](https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2) for more details.
+- Note 1: Make sure you can connect to Hugging Face and have installed the latest Transformers.
+- Note 2: Due to the [upsampling difference](https://github.com/huggingface/transformers/pull/31522#issuecomment-2184123463) between OpenCV (we used) and Pillow (HF used), predictions may differ slightly. So you are more recommended to use our models through the way introduced above.
+```python
+from transformers import pipeline
+from PIL import Image
+pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Small-hf")
+image = Image.open('your/image/path')
+depth = pipe(image)["depth"]
+```
+### Running script on *images*
+```bash
+python run.py \
+  --encoder <vits | vitb | vitl | vitg> \
+  --img-path <path> --outdir <outdir> \
+  [--input-size <size>] [--pred-only] [--grayscale]
+```
+Options:
+- `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.
+- `--input-size` (optional): By default, we use input size `518` for model inference. ***You can increase the size for even more fine-grained results.***
+- `--pred-only` (optional): Only save the predicted depth map, without raw image.
+- `--grayscale` (optional): Save the grayscale depth map, without applying color palette.
+For example:
+```bash
+python run.py --encoder vitl --img-path assets/examples --outdir depth_vis
+```
+### Running script on *videos*
+```bash
+python run_video.py \
+  --encoder <vits | vitb | vitl | vitg> \
+  --video-path assets/examples_video --outdir video_depth_vis \
+  [--input-size <size>] [--pred-only] [--grayscale]
+```
+***Our larger model has better temporal consistency on videos.***
+### Gradio demo
+To use our gradio demo locally:
+```bash
+python app.py
+```
+You can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2).
+***Note: Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)).*** In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use [intermediate features](https://github.com/DepthAnything/Depth-Anything-V2/blob/2cbc36a8ce2cec41d38ee51153f112e87c8e42d8/depth_anything_v2/dpt.py#L164-L169) instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.
+## Fine-tuned to Metric Depth Estimation
+Please refer to [metric depth estimation](./metric_depth).
+## DA-2K Evaluation Benchmark
+Please refer to [DA-2K benchmark](./DA-2K.md).
+## Community Support
+**We sincerely appreciate all the community support for our Depth Anything series. Thank you a lot!**
+- Apple Core ML:
+    - https://developer.apple.com/machine-learning/models
+    - https://huggingface.co/apple/coreml-depth-anything-v2-small
+    - https://huggingface.co/apple/coreml-depth-anything-small
+- Transformers:
+    - https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2
+    - https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything
+- TensorRT:
+    - https://github.com/spacewalk01/depth-anything-tensorrt
+    - https://github.com/zhujiajian98/Depth-Anythingv2-TensorRT-python
+- ONNX: https://github.com/fabio-sim/Depth-Anything-ONNX
+- ComfyUI: https://github.com/kijai/ComfyUI-DepthAnythingV2
+- Transformers.js (real-time depth in web): https://huggingface.co/spaces/Xenova/webgpu-realtime-depth-estimation
+- Android:
+  - https://github.com/shubham0204/Depth-Anything-Android
+  - https://github.com/FeiGeChuanShu/ncnn-android-depth_anything
+## Acknowledgement
+We are sincerely grateful to the awesome Hugging Face team ([@Pedro Cuenca](https://huggingface.co/pcuenq), [@Niels Rogge](https://huggingface.co/nielsr), [@Merve Noyan](https://huggingface.co/merve), [@Amy Roberts](https://huggingface.co/amyeroberts), et al.) for their huge efforts in supporting our models in Transformers and Apple Core ML.
+We also thank the [DINOv2](https://github.com/facebookresearch/dinov2) team for contributing such impressive models to our community.
+## LICENSE
+Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.
+## Citation
+If you find this project useful, please consider citing:
+```bibtex
+@article{depth_anything_v2,
+  title={Depth Anything V2},
+  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
+  journal={arXiv:2406.09414},
+  year={2024}
+}
+@inproceedings{depth_anything_v1,
+  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
+  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
+  booktitle={CVPR},
+  year={2024}
+}
+```

Depth-Anything-V2/app.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import glob
+import gradio as gr
+import matplotlib
+import numpy as np
+from PIL import Image
+import torch
+import tempfile
+from gradio_imageslider import ImageSlider
+from depth_anything_v2.dpt import DepthAnythingV2
+css = """
+#img-display-container {
+    max-height: 100vh;
+}
+#img-display-input {
+    max-height: 80vh;
+}
+#img-display-output {
+    max-height: 80vh;
+}
+#download {
+    height: 62px;
+}
+"""
+DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
+model_configs = {
+    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
+    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
+    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
+    'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
+}
+encoder = 'vitl'
+model = DepthAnythingV2(**model_configs[encoder])
+state_dict = torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location="cpu")
+model.load_state_dict(state_dict)
+model = model.to(DEVICE).eval()
+title = "# Depth Anything V2"
+description = """Official demo for **Depth Anything V2**.
+Please refer to our [paper](https://arxiv.org/abs/2406.09414), [project page](https://depth-anything-v2.github.io), or [github](https://github.com/DepthAnything/Depth-Anything-V2) for more details."""
+def predict_depth(image):
+    return model.infer_image(image)
+with gr.Blocks(css=css) as demo:
+    gr.Markdown(title)
+    gr.Markdown(description)
+    gr.Markdown("### Depth Prediction demo")
+    with gr.Row():
+        input_image = gr.Image(label="Input Image", type='numpy', elem_id='img-display-input')
+        depth_image_slider = ImageSlider(label="Depth Map with Slider View", elem_id='img-display-output', position=0.5)
+    submit = gr.Button(value="Compute Depth")
+    gray_depth_file = gr.File(label="Grayscale depth map", elem_id="download",)
+    raw_file = gr.File(label="16-bit raw output (can be considered as disparity)", elem_id="download",)
+    cmap = matplotlib.colormaps.get_cmap('Spectral_r')
+    def on_submit(image):
+        original_image = image.copy()
+        h, w = image.shape[:2]
+        depth = predict_depth(image[:, :, ::-1])
+        raw_depth = Image.fromarray(depth.astype('uint16'))
+        tmp_raw_depth = tempfile.NamedTemporaryFile(suffix='.png', delete=False)
+        raw_depth.save(tmp_raw_depth.name)
+        depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
+        depth = depth.astype(np.uint8)
+        colored_depth = (cmap(depth)[:, :, :3] * 255).astype(np.uint8)
+        gray_depth = Image.fromarray(depth)
+        tmp_gray_depth = tempfile.NamedTemporaryFile(suffix='.png', delete=False)
+        gray_depth.save(tmp_gray_depth.name)
+        return [(original_image, colored_depth), tmp_gray_depth.name, tmp_raw_depth.name]
+    submit.click(on_submit, inputs=[input_image], outputs=[depth_image_slider, gray_depth_file, raw_file])
+    example_files = glob.glob('assets/examples/*')
+    examples = gr.Examples(examples=example_files, inputs=[input_image], outputs=[depth_image_slider, gray_depth_file, raw_file], fn=on_submit)
+if __name__ == '__main__':
+    demo.queue().launch()

Depth-Anything-V2/requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+gradio_imageslider
+gradio==4.29.0
+matplotlib
+opencv-python
+torch
+torchvision

Depth-Anything-V2/run.py ADDED Viewed

	@@ -0,0 +1,73 @@

+import argparse
+import cv2
+import glob
+import matplotlib
+import numpy as np
+import os
+import torch
+from depth_anything_v2.dpt import DepthAnythingV2
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Depth Anything V2')
+    parser.add_argument('--img-path', type=str, default=r"D:\SMPL-X_pose_extraction\demo\inputs\000049.jpg")
+    parser.add_argument('--input-size', type=int, default=256)
+    parser.add_argument('--outdir', type=str, default='./demo')
+    parser.add_argument('--encoder', type=str, default='vitl', choices=['vits', 'vitb', 'vitl', 'vitg'])
+    parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='only display the prediction')
+    parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='do not apply colorful palette')
+    args = parser.parse_args()
+    DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
+    model_configs = {
+        'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
+        'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
+        'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
+        'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
+    }
+    depth_anything = DepthAnythingV2(**model_configs[args.encoder])
+    depth_anything.load_state_dict(torch.load(f'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight\depth_anything-v2\depth_anything_v2_{args.encoder}.pth', map_location='cpu'))
+    depth_anything = depth_anything.to(DEVICE).eval()
+    if os.path.isfile(args.img_path):
+        if args.img_path.endswith('txt'):
+            with open(args.img_path, 'r') as f:
+                filenames = f.read().splitlines()
+        else:
+            filenames = [args.img_path]
+    else:
+        filenames = glob.glob(os.path.join(args.img_path, '**/*'), recursive=True)
+    os.makedirs(args.outdir, exist_ok=True)
+    cmap = matplotlib.colormaps.get_cmap('Spectral_r')
+    for k, filename in enumerate(filenames):
+        print(f'Progress {k+1}/{len(filenames)}: {filename}')
+        raw_image = cv2.imread(filename)
+        depth = depth_anything.infer_image(raw_image, args.input_size)
+        depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
+        depth = depth.astype(np.uint8)
+        if args.grayscale:
+            depth = np.repeat(depth[..., np.newaxis], 3, axis=-1)
+        else:
+            depth = (cmap(depth)[:, :, :3] * 255)[:, :, ::-1].astype(np.uint8)
+        if args.pred_only:
+            cv2.imwrite(os.path.join(args.outdir, os.path.splitext(os.path.basename(filename))[0] + '.png'), depth)
+        else:
+            split_region = np.ones((raw_image.shape[0], 50, 3), dtype=np.uint8) * 255
+            combined_result = cv2.hconcat([raw_image, split_region, depth])
+            cv2.imwrite(os.path.join(args.outdir, os.path.splitext(os.path.basename(filename))[0] + '.png'), combined_result)

Depth-Anything-V2/run_video.py ADDED Viewed

	@@ -0,0 +1,92 @@

+import argparse
+import cv2
+import glob
+import matplotlib
+import numpy as np
+import os
+import torch
+from depth_anything_v2.dpt import DepthAnythingV2
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Depth Anything V2')
+    parser.add_argument('--video-path', type=str)
+    parser.add_argument('--input-size', type=int, default=518)
+    parser.add_argument('--outdir', type=str, default='./vis_video_depth')
+    parser.add_argument('--encoder', type=str, default='vitl', choices=['vits', 'vitb', 'vitl', 'vitg'])
+    parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='only display the prediction')
+    parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='do not apply colorful palette')
+    args = parser.parse_args()
+    DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
+    model_configs = {
+        'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
+        'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
+        'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
+        'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
+    }
+    depth_anything = DepthAnythingV2(**model_configs[args.encoder])
+    depth_anything.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{args.encoder}.pth', map_location='cpu'))
+    depth_anything = depth_anything.to(DEVICE).eval()
+    if os.path.isfile(args.video_path):
+        if args.video_path.endswith('txt'):
+            with open(args.video_path, 'r') as f:
+                lines = f.read().splitlines()
+        else:
+            filenames = [args.video_path]
+    else:
+        filenames = glob.glob(os.path.join(args.video_path, '**/*'), recursive=True)
+    os.makedirs(args.outdir, exist_ok=True)
+    margin_width = 50
+    cmap = matplotlib.colormaps.get_cmap('Spectral_r')
+    for k, filename in enumerate(filenames):
+        print(f'Progress {k+1}/{len(filenames)}: {filename}')
+        raw_video = cv2.VideoCapture(filename)
+        frame_width, frame_height = int(raw_video.get(cv2.CAP_PROP_FRAME_WIDTH)), int(raw_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        frame_rate = int(raw_video.get(cv2.CAP_PROP_FPS))
+        if args.pred_only:
+            output_width = frame_width
+        else:
+            output_width = frame_width * 2 + margin_width
+        output_path = os.path.join(args.outdir, os.path.splitext(os.path.basename(filename))[0] + '.mp4')
+        out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*"mp4v"), frame_rate, (output_width, frame_height))
+        while raw_video.isOpened():
+            ret, raw_frame = raw_video.read()
+            if not ret:
+                break
+            depth = depth_anything.infer_image(raw_frame, args.input_size)
+            depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
+            depth = depth.astype(np.uint8)
+            if args.grayscale:
+                depth = np.repeat(depth[..., np.newaxis], 3, axis=-1)
+            else:
+                depth = (cmap(depth)[:, :, :3] * 255)[:, :, ::-1].astype(np.uint8)
+            if args.pred_only:
+                out.write(depth)
+            else:
+                split_region = np.ones((frame_height, margin_width, 3), dtype=np.uint8) * 255
+                combined_frame = cv2.hconcat([raw_frame, split_region, depth])
+                out.write(combined_frame)
+        raw_video.release()
+        out.release()

SMPLest-X/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

SMPLest-X/.gitignore ADDED Viewed

	@@ -0,0 +1,8 @@

+data
+outputs
+pretrained_models
+demo
+*.pyc
+**/__pycache__
+**/.DS_Store
+**/human_model_files

SMPLest-X/LICENSE.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+S-Lab License 1.0
+Copyright 2022 S-Lab
+Redistribution and use for non-commercial purpose in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+4. In the event that redistribution and/or use for commercial purpose in source or binary forms, with or without modification is required, please contact the contributor(s) of the work.

SMPLest-X/README.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation
+This work is the extended version of [SMPLer-X](https://arxiv.org/abs/2309.17448). This new codebase is designed for easy installation and flexible development, enabling seamless integration of new methods with the pretrained SMPLest-X model.
+![Teaser](./assets/teaser.png)
+## Useful links
+<div align="center">
+    <a href="https://arxiv.org/abs/2501.09782" class="button"><b>[arXiv]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
+    <a href="https://caizhongang.github.io/projects/SMPLer-X/" class="button"><b>[Homepage]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
+    <a href="https://youtu.be/DepTqbPpVzY" class="button"><b>[Video]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
+    <a href="https://github.com/caizhongang/SMPLer-X" class="button"><b>[SMPLer-X]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
+    <a href="https://github.com/open-mmlab/mmhuman3d" class="button"><b>[MMHuman3D]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
+    <a href="https://github.com/wqyin/WHAC/tree/main" class="button"><b>[WHAC]</b></a></a>
+</div>
+## News
+- [2025-10-21] SMPLest-X accepted to TPAMI.
+- [2025-02-17] Pretrained model available for download.
+- [2025-02-14] 💌💌💌 Brand new codebase released for training, testing and inference.
+- [2025-01-20] Paper released on [arXiv](https://arxiv.org/abs/2501.09782).
+- [2025-01-08] Project page created.
+## Install
+```bash
+bash scripts/install.sh
+```
+## Preparation
+#### SMPLest-X pretrained models
+- Download the pretrained **SMPLest-X-Huge model** weight from [here](https://huggingface.co/waanqii/SMPLest-X/tree/main) (8.2G).
+- Place the pretrained weight and respective config file according to the file structure.
+#### Parametric human models
+- Download [SMPL-X](https://smpl-x.is.tue.mpg.de/) and [SMPL](https://smpl.is.tue.mpg.de/) body models.
+#### ViT-Pose pretrained models (For training only)
+- Follow [OSX](https://github.com/IDEA-Research/OSX) in preparing pretrained ViTPose models. Download the ViTPose pretrained weights from [here](https://github.com/ViTAE-Transformer/ViTPose).
+#### HumanData
+- Please refer to [this guide](humandata_prep/README.md) for instructions on preparing the data in the HumanData format.
+The final file structure should be like:
+```
+.
+├── assets
+├── configs
+├── data
+│   ├── annot # humandata.npz files
+│   ├── cache # cached humandata
+│   └── img # original data files
+├── datasets
+├── demo
+├── human_models
+│   └── human_model_files # parametric human models
+├── main
+├── models
+├── outputs
+│   └── smplest_x_h
+├── pretrained_models
+│   ├── vitpose_huge.pth # for training only
+│   ├── yolov8x.pt # auto download during inference
+│   └── smplest_x_h
+│       ├── smplest_x_h.pth.tar
+│       └── config_base.py
+├── scripts
+├── utils
+├── README.md
+└── requirements.txt
+```
+## Inference
+- Place the video for inference under `SMPLest-X/demo`
+- Prepare the pretrained model under `SMPLest-X/pretrained_models`
+- Pretrained YOLO model will be downloaded automatically during the first time usage.
+- Inference output will be saved in `SMPLest-X/demo`
+```bash
+sh scripts/inference.sh {MODEL_DIR} {FILE_NAME} {FPS}
+# For inferencing test_video.mp4 (30FPS) with SMPLest-X/pretrained_models/smplest_x_h/smplest_x_h.pth.tar
+sh scripts/inference.sh smplest_x_h test_video.mp4 30
+```
+## Training
+```bash
+bash scripts/train.sh {JOB_NAME} {NUM_GPUS} {CONFIG_FILE}
+# For training SMPLest-X-H with 16 GPUS
+bash scripts/train.sh smplest_x_h 16 config_smplest_x_h.py
+```
+- CONFIG_FILE is the file name under `SMPLest-X/config`
+- Logs and checkpoints will be saved to `SMPLest-X/outputs/train_{JOB_NAME}_{DATE_TIME}`
+## Testing
+```bash
+sh scripts/test.sh {TEST_DATSET} {MODEL_DIR} {CKPT_ID}
+# For testing the model SMPLest-X/outputs/smplest_x_h/model_dump/snapshot_5.pth.tar
+# on dataset SynHand
+sh scripts/test.sh SynHand smplest_x_h 5
+```
+- NUM_GPU = 1 is used by default for testing
+- Logs and results  will be saved to `SMPLest-X/outputs/test_{TEST_DATSET}_ep{CKPT_ID}_{DATE_TIME}`
+## FAQ
+- How do I animate my virtual characters with SMPLest-X output (like that in the demo video)?
+  - We are working on that, please stay tuned!
+    Currently, this repo supports SMPL-X estimation and a simple visualization (overlay of SMPL-X vertices).
+## Citation
+```text
+# SMPLest-X
+@article{yin2025smplest,
+  title={SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation},
+  author={Yin, Wanqi and Cai, Zhongang and Wang, Ruisi and Zeng, Ailing and Wei, Chen and Sun, Qingping and Mei, Haiyi and Wang, Yanjun and Pang, Hui En and Zhang, Mingyuan and Zhang, Lei and Loy, Chen Change and Yamashita, Atsushi and Yang, Lei and Liu, Ziwei},
+  journal={arXiv preprint arXiv:2501.09782},
+  year={2025}
+}
+# SMPLer-X
+@inproceedings{cai2023smplerx,
+    title={{SMPLer-X}: Scaling up expressive human pose and shape estimation},
+    author={Cai, Zhongang and Yin, Wanqi and Zeng, Ailing and Wei, Chen and Sun, Qingping and Yanjun, Wang and Pang, Hui En and Mei, Haiyi and Zhang, Mingyuan and Zhang, Lei and Loy, Chen Change and Yang, Lei and Liu, Ziwei},
+    booktitle={Advances in Neural Information Processing Systems},
+    year={2023}
+}
+```
+## Explore More [SMPLCap](https://github.com/SMPLCap) Projects
+- [TPAMI'25] [SMPLest-X](https://github.com/SMPLCap/SMPLest-X): An extended version of [SMPLer-X](https://github.com/SMPLCap/SMPLer-X) with stronger foundation models.
+- [ECCV'24] [WHAC](https://github.com/SMPLCap/WHAC): World-grounded human pose and camera estimation from monocular videos.
+- [CVPR'24] [AiOS](https://github.com/SMPLCap/AiOS): An all-in-one-stage pipeline combining detection and 3D human reconstruction.
+- [NeurIPS'23] [SMPLer-X](https://github.com/SMPLCap/SMPLer-X): Scaling up EHPS towards a family of generalist foundation models.
+- [NeurIPS'23] [RoboSMPLX](https://github.com/SMPLCap/RoboSMPLX): A framework to enhance the robustness of
+whole-body pose and shape estimation.
+- [ICCV'23] [Zolly](https://github.com/SMPLCap/Zolly): 3D human mesh reconstruction from perspective-distorted images.
+- [arXiv'23] [PointHPS](https://github.com/SMPLCap/PointHPS): 3D HPS from point clouds captured in real-world settings.
+- [NeurIPS'22] [HMR-Benchmarks](https://github.com/SMPLCap/hmr-benchmarks): A comprehensive benchmark of HPS datasets, backbones, and training strategies.

SMPLest-X/datasets/SynHand.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import os.path as osp
+from datasets.humandata import HumanDataset
+class SynHand(HumanDataset):
+    def __init__(self, transform, data_split, cfg):
+        super(SynHand, self).__init__(transform, data_split, cfg)
+        self.cfg = cfg
+        self.use_cache = getattr(self.cfg.data, 'use_cache', False)
+        self.annot_path_cache = osp.join(self.cfg.data.data_dir, 'cache', f'synhand_{self.data_split}.npz')
+        self.img_shape = None  #(h, w)
+        self.cam_param = {}
+        # load data or cache
+        if self.use_cache and osp.isfile(self.annot_path_cache):
+            print(f'[{self.__class__.__name__}] Loading cache from {self.annot_path_cache}')
+            self.datalist = self.load_cache(self.annot_path_cache)
+        else:
+            if self.use_cache:
+                print(f'[{self.__class__.__name__}] Cache not found, generating cache...')
+            self.datalist = []
+            self.img_dir = osp.join(self.cfg.data.data_dir, 'img', 'synbody')
+            if self.data_split == 'train':
+                filename = f'synhand_20240927_241004_4628_fix_betas.npz'
+            else:
+                filename = f'synhand_20241018_test_241023_1188_fix_betas.npz'
+            self.annot_path = osp.join(self.cfg.data.data_dir, 'annot', filename)
+            self.datalist= self.load_data(
+                train_sample_interval=getattr(self.cfg.data, f'{self.__class__.__name__}_train_sample_interval', 1),
+                test_sample_interval=getattr(self.cfg.data, f'{self.__class__.__name__}_test_sample_interval', 10))
+            if self.use_cache:
+                self.save_cache(self.annot_path_cache, self.datalist)

SMPLest-X/datasets/dataset.py ADDED Viewed

	@@ -0,0 +1,103 @@

+import random
+import numpy as np
+from torch.utils.data.dataset import Dataset
+class MultipleDatasets(Dataset):
+    def __init__(self, dbs, make_same_len=True, total_len=None, verbose=False, length_dict=None):
+        self.dbs = dbs
+        self.db_num = len(self.dbs)
+        self.max_db_data_num = max([len(db) for db in dbs])
+        self.make_same_len = make_same_len
+        self.length_dict = length_dict
+        if length_dict is not None: # weighted
+            self.db_length = []
+            for db in dbs:
+                name = db.__class__.__name__
+                length = length_dict[name]
+                self.db_length.append(length)
+            self.db_len_cumsum = np.cumsum(self.db_length)
+        else:
+            self.db_len_cumsum = np.cumsum([len(db) for db in dbs])
+        if total_len == 'auto': #concat/balance
+            self.total_len = self.db_len_cumsum[-1]
+            self.auto_total_len = True
+        else: #balance/weighted
+            self.total_len = total_len
+            self.auto_total_len = False
+        if total_len is not None:
+            self.per_db_len = self.total_len // self.db_num
+        if verbose:
+            print('datasets original:', [len(self.dbs[i]) for i in range(self.db_num)])
+            if length_dict is not None:
+                print('defined length:', length_dict)
+            print(f'Auto total length: {self.auto_total_len}, {self.total_len}')
+    def __len__(self):
+        # all dbs have the same length
+        if self.make_same_len:
+            if self.total_len is None:
+                # match the longest length
+                return self.max_db_data_num * self.db_num
+            else:
+                # each dataset has the same length and total len is fixed
+                return self.total_len
+        else:
+            if self.total_len is None:
+                # each db has different length, simply concat
+                return sum([len(db) for db in self.dbs])
+            else:
+                # defined or calculated db length
+                return self.total_len
+    def __getitem__(self, index):
+        if self.make_same_len:
+            if self.total_len is None:
+                # match the longest length
+                db_idx = index // self.max_db_data_num
+                data_idx = index % self.max_db_data_num
+                if data_idx >= len(self.dbs[db_idx]) * (self.max_db_data_num // len(self.dbs[db_idx])): # last batch: random sampling
+                    data_idx = random.randint(0,len(self.dbs[db_idx])-1)
+                else: # before last batch: use modular
+                    data_idx = data_idx % len(self.dbs[db_idx])
+            else:
+                db_idx = index // self.per_db_len
+                data_idx = index % self.per_db_len
+                if db_idx > (self.db_num - 1):
+                    # last batch: randomly choose one dataset
+                    db_idx = random.randint(0,self.db_num - 1)
+                if len(self.dbs[db_idx]) < self.per_db_len  and \
+                        data_idx >= len(self.dbs[db_idx]) * (self.per_db_len  // len(self.dbs[db_idx])):
+                    # last batch: random sampling in this dataset
+                    data_idx = random.randint(0,len(self.dbs[db_idx]) - 1)
+                else:
+                    # before last batch: use modular
+                    data_idx = data_idx % len(self.dbs[db_idx])
+        else:
+            for i in range(self.db_num):
+                if index < self.db_len_cumsum[i]:
+                    db_idx = i
+                    break
+            if db_idx == 0:
+                data_idx = index
+            else:
+                data_idx = index - self.db_len_cumsum[db_idx-1]
+            if self.length_dict is not None:
+                # make the data idx valid if total data less than defined data length
+                if len(self.dbs[db_idx]) < self.db_length[db_idx]  and \
+                        data_idx >= len(self.dbs[db_idx]) * (self.db_length[db_idx]  // len(self.dbs[db_idx])):
+                    # last batch: random sampling in this dataset
+                    data_idx = random.randint(0,len(self.dbs[db_idx]) - 1)
+                else:
+                    # before last batch: use modular
+                    data_idx = data_idx % len(self.dbs[db_idx])
+        return self.dbs[db_idx][data_idx]

SMPLest-X/datasets/humandata.py ADDED Viewed

	@@ -0,0 +1,1076 @@

+import os
+import os.path as osp
+import numpy as np
+import torch
+import copy
+from human_models.human_models import SMPL, SMPLX
+from utils.data_utils import load_img, process_bbox, augmentation, \
+    process_db_coord, process_human_model_output, \
+    process_db_coord_crop, gen_cropped_hands
+from utils.transforms import rigid_align, batch_rodrigues
+import tqdm
+import time
+import random
+import pickle
+from constants import *
+class Cache():
+    """ A custom implementation for OSX pipeline
+        Need to run tool/cache/fix_cache.py to fix paths
+    """
+    def __init__(self, load_path=None):
+        if load_path is not None:
+            self.load(load_path)
+    def load(self, load_path):
+        self.load_path = load_path
+        self.cache = np.load(load_path, allow_pickle=True)
+        self.data_len = self.cache['data_len']
+        self.data_strategy = self.cache['data_strategy']
+        assert self.data_len == len(self.cache) - 2  # data_len, data_strategy
+        self.cache = None
+    @classmethod
+    def save(cls, save_path, data_list, data_strategy):
+        assert save_path is not None, 'save_path is None'
+        data_len = len(data_list)
+        cache = {}
+        for i, data in enumerate(data_list):
+            cache[str(i)] = data
+        assert len(cache) == data_len
+        # update meta
+        cache.update({
+            'data_len': data_len,
+            'data_strategy': data_strategy})
+        np.savez_compressed(save_path, **cache)
+        print(f'Cache saved to {save_path}.')
+    # def shuffle(self):
+    #     random.shuffle(self.mapping)
+    def __len__(self):
+        return self.data_len
+    def __getitem__(self, idx):
+        if self.cache is None:
+            self.cache = np.load(self.load_path, allow_pickle=True)
+        # mapped_idx = self.mapping[idx]
+        # cache_data = self.cache[str(mapped_idx)]
+        cache_data = self.cache[str(idx)]
+        data = cache_data.item()
+        return data
+class HumanDataset(torch.utils.data.Dataset):
+    def __init__(self, transform, data_split, cfg):
+        self.transform = transform
+        self.data_split = data_split
+        self.cfg = cfg
+        # dataset information, to be filled by child class
+        self.img_dir = None
+        self.annot_path = None
+        self.annot_path_cache = None
+        self.use_cache = False
+        self.save_idx = 0
+        self.img_shape = None  # (h, w)
+        self.cam_param = None  # {'focal_length': (fx, fy), 'princpt': (cx, cy)}
+        self.use_betas_neutral = False
+        self.smpl_x = SMPLX.get_instance()
+        self.smpl = SMPL.get_instance()
+        self.joint_set = {
+            'joint_num': self.smpl_x.joint_num,
+            'joints_name': self.smpl_x.joints_name,
+            'flip_pairs': self.smpl_x.flip_pairs}
+        self.joint_set['root_joint_idx'] = self.joint_set['joints_name'].index('Pelvis')
+        self.downsample_mat = pickle.load(open(f'{self.cfg.model.human_model_path}/smplx2smpl.pkl',
+                                                'rb'))['matrix']
+    def load_cache(self, annot_path_cache):
+        datalist = Cache(annot_path_cache)
+        return datalist
+    def save_cache(self, annot_path_cache, datalist):
+        print(f'[{self.__class__.__name__}] Caching datalist to {self.annot_path_cache}...')
+        Cache.save(
+            annot_path_cache,
+            datalist,
+            data_strategy=getattr(self.cfg.data, 'data_strategy', None)
+        )
+    def load_data(self, train_sample_interval=1, test_sample_interval=1):
+        content = np.load(self.annot_path, allow_pickle=True)
+        num_examples = len(content['image_path'])
+        if 'meta' in content:
+            meta = content['meta'].item()
+            print('meta keys:', meta.keys())
+            if 'annot_valid' in meta.keys(): # agora
+                annot_valid = meta['annot_valid']
+            else:
+                annot_valid = None
+            if 'valid_label' in meta.keys(): # Ubody
+                invalid_label = np.array(meta['valid_label']) == 0 # skip when True
+                iscrowd = np.array(meta['iscrowd']) # skip when True
+                num_keypoints_zero = np.array(meta['num_keypoints']) == 0 # skip when True
+                skip_ubody = [iscrowd[i] or num_keypoints_zero[i] or invalid_label[i] for i in range(len(iscrowd))]
+            else:
+                skip_ubody = None
+            if 'iscrowd' in meta.keys(): # mscoco
+                iscrowd = np.array(meta['iscrowd']) # skip when True
+                num_keypoints_zero = np.array(meta['num_keypoints']) == 0 # skip when True
+                skip_mscoco = [iscrowd[i] or num_keypoints_zero[i] for i in range(len(iscrowd))]
+            else:
+                skip_mscoco = None
+        else:
+            meta = None
+            annot_valid = None
+            skip_ubody = None
+            skip_mscoco= None
+            print('No meta info provided! Please give height and width manually')
+        #  ARCTIC val set
+        if 'vertices3d_path' in content:
+            vertices3d_path = content['vertices3d_path']
+        else:
+            vertices3d_path = None
+        print(f'Start loading humandata {self.annot_path} into memory...\nDataset includes: {content.files}'); tic = time.time()
+        image_path = content['image_path']
+        if meta is not None and 'height' in meta:
+            height = np.array(meta['height'])
+            width = np.array(meta['width'])
+            image_shape = np.stack([height, width], axis=-1)
+        else:
+            image_shape = None
+        if self.__class__.__name__ == 'Hi4D':
+            image_shape = None
+        if 'smplx' in content:
+            smplx = content['smplx'].item()
+            as_smplx = 'smplx'
+            if self.__class__.__name__ == 'UBody':
+                smplx.pop('leye_pose')
+                smplx.pop('reye_pose')
+        elif 'smpl' in content:
+            smplx = content['smpl'].item()
+            as_smplx = 'smpl'
+        elif 'smplh' in content:
+            smplx = content['smplh'].item()
+            as_smplx = 'smplh'
+        # TODO: temp solution, should be more general. But SHAPY is very special
+        elif self.__class__.__name__ == 'SHAPY':
+            smplx = {}
+        else:
+            raise KeyError('No SMPL for SMPLX available, please check keys:\n'
+                        f'{content.files}')
+        if self.__class__.__name__ == 'PW3D' and 'test' in self.annot_path:
+            print('load smpl for PW3d!')
+            smplx = content['smpl'].item()
+            as_smplx = 'smpl'
+            gender = content['meta'].item()['gender']
+        else:
+            gender = None
+        print('Smplx param', smplx.keys())
+        # mano
+        if 'mano' in content:
+            mano = content['mano']
+        else:
+            mano = None
+        # bbox
+        if 'bbox_xywh' in content:
+            bbox_xywh = content['bbox_xywh']
+        else:
+            raise KeyError(f'Necessary key [bbox_xywh] is missing in HumanData for {self.__class__.__name__}.')
+        if 'lhand_bbox_xywh' in content:
+            lhand_bbox_xywh = content['lhand_bbox_xywh']
+        else:
+            lhand_bbox_xywh = np.zeros((num_examples, 5))
+        if 'rhand_bbox_xywh' in content:
+            rhand_bbox_xywh = content['rhand_bbox_xywh']
+        else:
+            rhand_bbox_xywh = np.zeros((num_examples, 5))
+        if 'face_bbox_xywh' in content:
+            face_bbox_xywh = content['face_bbox_xywh']
+        else:
+            face_bbox_xywh = np.zeros((num_examples, 5))
+        decompressed = False
+        if content['__keypoints_compressed__']:
+            decompressed_kps = self.decompress_keypoints(content)
+            decompressed = True
+        keypoints3d = None
+        valid_kps3d = False
+        keypoints3d_mask = None
+        valid_kps3d_mask = False
+        for kps3d_key in KPS3D_KEYS:
+            if kps3d_key in content:
+                keypoints3d = decompressed_kps[kps3d_key][:, SMPLX_137_MAPPING, :3] if decompressed \
+                else content[kps3d_key][:, SMPLX_137_MAPPING, :3]
+                valid_kps3d = True
+                if f'{kps3d_key}_mask' in content:
+                    keypoints3d_mask = content[f'{kps3d_key}_mask'][SMPLX_137_MAPPING]
+                    valid_kps3d_mask = True
+                elif 'keypoints3d_mask' in content:
+                    keypoints3d_mask = content['keypoints3d_mask'][SMPLX_137_MAPPING]
+                    valid_kps3d_mask = True
+                break
+        for kps2d_key in KPS2D_KEYS:
+            if kps2d_key in content:
+                keypoints2d = decompressed_kps[kps2d_key][:, SMPLX_137_MAPPING, :2] if decompressed \
+                    else content[kps2d_key][:, SMPLX_137_MAPPING, :2]
+                if f'{kps2d_key}_mask' in content:
+                    keypoints2d_mask = content[f'{kps2d_key}_mask'][SMPLX_137_MAPPING]
+                elif 'keypoints2d_mask' in content:
+                    keypoints2d_mask = content['keypoints2d_mask'][SMPLX_137_MAPPING]
+                break
+        mask = keypoints3d_mask if valid_kps3d_mask \
+                else keypoints2d_mask
+        print('Done. Time: {:.2f}s'.format(time.time() - tic))
+        datalist = []
+        for i in tqdm.tqdm(range(int(num_examples))):
+            if annot_valid is not None and not annot_valid[i]: continue # for agora
+            if skip_ubody is not None and skip_ubody[i]: continue # for ubody
+            if skip_mscoco is not None and skip_mscoco[i]: continue # for mscoco
+            if self.data_split == 'train' and i % train_sample_interval != 0:
+                continue
+            if self.data_split == 'test' and i % test_sample_interval != 0:
+                continue
+            if vertices3d_path is not None:
+                vertices3d = np.load(osp.join(self.img_dir, vertices3d_path[i]))
+            else:
+                vertices3d = None
+            if 'MPI_INF_3DHP' in self.__class__.__name__:
+                img_path = osp.join(self.img_dir, image_path[i][1:]) # remove the first /
+            else:
+                img_path = osp.join(self.img_dir, image_path[i])
+            # import pdb; pdb.set_trace()
+            img_shape = image_shape[i] if image_shape is not None else self.img_shape
+            joint_img = keypoints2d[i]
+            joint_valid = mask.reshape(-1, 1)
+            bbox = bbox_xywh[i][:4]
+            lhand_bbox = lhand_bbox_xywh[i]
+            rhand_bbox = rhand_bbox_xywh[i]
+            face_bbox = face_bbox_xywh[i]
+            if hasattr(self.cfg.data, 'bbox_ratio'):
+                bbox_ratio = self.cfg.data.bbox_ratio * 0.833 # preprocess body bbox is giving 1.2 box padding
+            else:
+                bbox_ratio = 1.25
+            left_hand_chosen = None
+            bbox = process_bbox(bbox, img_width=img_shape[1], img_height=img_shape[0], ratio=bbox_ratio,
+                                input_img_shape=self.cfg.model.input_img_shape)
+            if bbox is None:
+                print("skip since no bbox")
+                continue
+            # if hasattr(cfg, 'do_crop'):
+            #     if cfg.do_crop:
+            #         joint_valid_temp = process_db_coord_crop(bbox, joint_img)
+            if lhand_bbox[-1] > 0:  # conf > 0
+                lhand_bbox = lhand_bbox[:4]
+                if hasattr(self.cfg.data, 'bbox_ratio'):
+                    lhand_bbox = process_bbox(lhand_bbox, img_width=img_shape[1], img_height=img_shape[0], ratio=self.cfg.data.bbox_ratio,
+                                            input_img_shape=self.cfg.model.input_img_shape)
+                if lhand_bbox is not None:
+                    lhand_bbox[2:] += lhand_bbox[:2]  # xywh -> xyxy
+            else:
+                lhand_bbox = None
+            if rhand_bbox[-1] > 0:
+                rhand_bbox = rhand_bbox[:4]
+                if hasattr(self.cfg.data, 'bbox_ratio'):
+                    rhand_bbox = process_bbox(rhand_bbox, img_width=img_shape[1], img_height=img_shape[0], ratio=self.cfg.data.bbox_ratio,
+                                            input_img_shape=self.cfg.model.input_img_shape)
+                if rhand_bbox is not None:
+                    rhand_bbox[2:] += rhand_bbox[:2]  # xywh -> xyxy
+            else:
+                rhand_bbox = None
+            if face_bbox[-1] > 0:
+                face_bbox = face_bbox[:4]
+                if hasattr(self.cfg.data, 'bbox_ratio'):
+                    face_bbox = process_bbox(face_bbox, img_width=img_shape[1], img_height=img_shape[0], ratio=self.cfg.data.bbox_ratio,
+                                            input_img_shape=self.cfg.model.input_img_shape)
+                if face_bbox is not None:
+                    face_bbox[2:] += face_bbox[:2]  # xywh -> xyxy
+            else:
+                face_bbox = None
+            if valid_kps3d:
+                joint_cam = keypoints3d[i]
+            else:
+                joint_cam = None
+            smplx_param = {k: v[i] for k, v in smplx.items()}
+            # agora skip kids
+            is_kids = smplx_param.pop('betas_extra', 0)
+            # import pdb; pdb.set_trace()
+            if is_kids != 0:
+                print('skip kids')
+                continue
+            # TODO: set invalid if None?
+            smplx_param['body_pose'] = smplx_param.pop('body_pose', None)
+            smplx_param['root_pose'] = smplx_param.pop('global_orient', None)
+            smplx_param['shape'] = smplx_param.pop('betas', np.zeros(10, dtype=np.float32))
+            smplx_param['shape'] = smplx_param['shape'][:10]
+            smplx_param['trans'] = smplx_param.pop('transl', np.zeros(3))
+            smplx_param['lhand_pose'] = smplx_param.pop('left_hand_pose', None)
+            smplx_param['rhand_pose'] = smplx_param.pop('right_hand_pose', None)
+            smplx_param['expr'] = smplx_param.pop('expression', None)
+            # TODO do not fix betas, give up shape supervision
+            if 'betas_neutral' in smplx_param:
+                smplx_param['shape'] = smplx_param.pop('betas_neutral')
+                # smplx_param['shape'] = np.zeros(10, dtype=np.float32)
+                smplx_param['shape'] = smplx_param['shape'][:10]
+            # # TODO fix shape of poses
+            if self.__class__.__name__ == 'Talkshow':
+                smplx_param['body_pose'] = smplx_param['body_pose'].reshape(21, 3)
+                smplx_param['lhand_pose'] = smplx_param['lhand_pose'].reshape(15, 3)
+                smplx_param['rhand_pose'] = smplx_param['lhand_pose'].reshape(15, 3)
+                smplx_param['expr'] = smplx_param['expr'][:10]
+            if self.__class__.__name__ == 'ARCTIC':
+                smplx_param['shape'] = np.zeros(10, dtype=np.float32)
+            # 'BEDLAM'
+            if self.__class__.__name__ in ['GTA_Human2','GTA_Human_full',
+                                        'SynBody_whac', 'SynBody_Magic1','SynBody', 'SynBody_full', 'SynHand',
+                                        'CHI3D', 'FIT3D', 'HumanSC3D',
+                                        'MOYO', 'ARCTIC',]:
+                smplx_param['shape'] = smplx_param['shape'][:10]
+                # print('[Flat Hand Mean]:manually set flat_hand_mean = True -> flat_hand_mean = False')
+                # manually set flat_hand_mean = True -> flat_hand_mean = False
+                smplx_param['lhand_pose'] -= HANDS_MEAN_L
+                smplx_param['rhand_pose'] -= HANDS_MEAN_R
+            if as_smplx == 'smpl':
+                smplx_param['smpl_pose'] = smplx_param['body_pose']
+                smplx_param['body_pose'] = smplx_param['body_pose'].reshape(-1, 3)
+                smplx_param['body_pose'] = smplx_param['body_pose'][:21, :] # use smpl body_pose on smplx
+                smplx_param['smpl_shape'] = smplx_param['shape']
+                smplx_param['shape'] = np.zeros(10, dtype=np.float32) # drop smpl betas for smplx
+                if gender is not None:
+                    smplx_param['gender'] = gender[i]
+            if as_smplx == 'smplh':
+                smplx_param['shape'] = np.zeros(10, dtype=np.float32) # drop smpl betas for smplx
+            import pdb
+            # for hand datasets, set shape and pose to all zero
+            if self.__class__.__name__ in ['FreiHand', 'InterHand', 'BlurHand', 'HanCo']:
+                smplx_param['shape'] = np.zeros((10, ))
+                smplx_param['root_pose'] = np.zeros((3))
+                smplx_param['body_pose'] = np.zeros((21, 3))
+            if smplx_param['lhand_pose'] is None or (smplx_param['lhand_pose'] == 0).all():
+                smplx_param['lhand_valid'] = False
+                # TODO: manually set joint_valid to 0
+                joint_valid[self.smpl_x.joint_part['lhand'], :] = 0
+                joint_valid[self.smpl_x.lwrist_idx, :] = 0
+            else:
+                smplx_param['lhand_valid'] = True
+                joint_valid[self.smpl_x.joint_part['lhand'], :] = 1
+                joint_valid[self.smpl_x.lwrist_idx, :] = 1
+            if smplx_param['rhand_pose'] is None or (smplx_param['rhand_pose'] == 0).all():
+                smplx_param['rhand_valid'] = False
+                joint_valid[self.smpl_x.joint_part['rhand'], :] = 0
+                joint_valid[self.smpl_x.rwrist_idx, :] = 0
+            else:
+                smplx_param['rhand_valid'] = True
+                joint_valid[self.smpl_x.joint_part['rhand'], :] = 1
+                joint_valid[self.smpl_x.rwrist_idx, :] = 1
+            if smplx_param['expr'] is None:
+                smplx_param['face_valid'] = False
+            else:
+                smplx_param['face_valid'] = True
+            if joint_cam is not None and np.any(np.isnan(joint_cam)):
+                print("skip since no kps")
+                continue
+            datalist.append({
+                'img_path': img_path,
+                'img_shape': img_shape,
+                'bbox': bbox,
+                'lhand_bbox': lhand_bbox,
+                'rhand_bbox': rhand_bbox,
+                'face_bbox': face_bbox,
+                'joint_img': joint_img,
+                'joint_cam': joint_cam,
+                'joint_valid': joint_valid,
+                'smplx_param': smplx_param,
+                'model': as_smplx,
+                'extrinsic_r': extrinsic_r[i] if 'extrinsic_r' in locals() else np.eye(3,3),
+                'vertices3d': vertices3d if vertices3d is not None else -1,
+                'idx': i})
+        # save memory
+        del content, image_path, bbox_xywh, lhand_bbox_xywh, rhand_bbox_xywh, face_bbox_xywh, keypoints3d, keypoints2d
+        if self.data_split == 'train':
+            print(f'[{self.__class__.__name__} train] original size:', int(num_examples),
+                  '. Sample interval:', train_sample_interval,
+                  '. Sampled size:', len(datalist))
+        if (getattr(self.cfg.data, 'data_strategy', None) == 'balance' and self.data_split == 'train') or \
+                (getattr(self.cfg.data, 'data_strategy', None) == 'weighted' and self.data_split == 'train'):
+            print(f'[{self.__class__.__name__}] Using [balance/weighted] strategy with datalist shuffled...')
+            random.seed(2023)
+            random.shuffle(datalist)
+        return datalist
+    def __len__(self):
+        return len(self.datalist)
+    def __getitem__(self, idx):
+        try:
+            data = copy.deepcopy(self.datalist[idx])
+        except Exception as e:
+            print(f'[{self.__class__.__name__}] Error loading data {idx}')
+            print(e)
+            exit(0)
+        img_path, img_shape, bbox = data['img_path'], data['img_shape'], data['bbox']
+        img = load_img(img_path)
+        no_aug = getattr(self.cfg.data, 'no_aug', False)
+        img, img2bb_trans, bb2img_trans, rot, do_flip = augmentation(no_aug, img, bbox,
+                                                                    self.data_split,
+                                                                    self.cfg.model.input_img_shape)
+        img = self.transform(img.astype(np.float32)) / 255.
+        ## for vis on original img
+        focal = [self.cfg.model.focal[0] / self.cfg.model.input_body_shape[1] * bbox[2],
+                self.cfg.model.focal[1] / self.cfg.model.input_body_shape[0] * bbox[3]]
+        princpt = [self.cfg.model.princpt[0] / self.cfg.model.input_body_shape[1] * bbox[2] + bbox[0],
+                self.cfg.model.princpt[1] / self.cfg.model.input_body_shape[0] * bbox[3] + bbox[1]]
+        if self.data_split == 'train':
+            # h36m gt
+            joint_cam = data['joint_cam']
+            if joint_cam is not None:
+                dummy_cord = False
+                joint_cam = joint_cam - joint_cam[self.joint_set['root_joint_idx'], None, :]  # root-relative
+            else:
+                # dummy cord as joint_cam
+                dummy_cord = True
+                joint_cam = np.zeros((self.joint_set['joint_num'], 3), dtype=np.float32)
+            joint_img = data['joint_img']
+            joint_img = np.concatenate((joint_img[:, :2], joint_cam[:, 2:]), 1)  # x, y, depth
+            if not dummy_cord:
+                joint_img[:, 2] = (joint_img[:, 2] / (self.cfg.model.body_3d_size / 2) + 1) / 2. * self.cfg.model.output_hm_shape[0]  # discretize depth
+            joint_img_aug, joint_cam_wo_ra, \
+            joint_cam_ra, joint_valid, joint_trunc = process_db_coord(
+                                                        joint_img=joint_img,
+                                                        joint_cam=joint_cam,
+                                                        joint_valid=data['joint_valid'],
+                                                        do_flip=do_flip,
+                                                        img_shape=img_shape,
+                                                        flip_pairs=self.joint_set['flip_pairs'],
+                                                        img2bb_trans=img2bb_trans,
+                                                        rot=rot,
+                                                        src_joints_name=self.joint_set['joints_name'],
+                                                        target_joints_name=self.smpl_x.joints_name,
+                                                        input_img_shape=self.cfg.model.input_img_shape,
+                                                        output_hm_shape=self.cfg.model.output_hm_shape,
+                                                        input_body_shape=self.cfg.model.input_body_shape)
+            # smplx coordinates and parameters
+            smplx_param = data['smplx_param']
+            smplx_joint_img, smplx_joint_cam, smplx_joint_trunc, smplx_pose, smplx_shape, smplx_expr, \
+            smplx_pose_valid, smplx_joint_valid, smplx_expr_valid, \
+            smplx_mesh_cam_orig = process_human_model_output(
+                                        human_model_param=smplx_param,
+                                        cam_param=self.cam_param,
+                                        do_flip=do_flip,
+                                        img_shape=img_shape,
+                                        img2bb_trans=img2bb_trans,
+                                        rot=rot,
+                                        human_model_type='smplx',
+                                        joint_img=None if self.cam_param else joint_img,
+                                        body_3d_size=self.cfg.model.body_3d_size,
+                                        hand_3d_size=self.cfg.model.hand_3d_size,
+                                        face_3d_size=self.cfg.model.face_3d_size,
+                                        input_img_shape=self.cfg.model.input_img_shape,
+                                        output_hm_shape=self.cfg.model.output_hm_shape,
+                                        )
+            # TODO temp fix keypoints3d for renbody
+            if 'RenBody' in self.__class__.__name__:
+                joint_cam_ra = smplx_joint_cam.copy()
+                joint_cam_wo_ra = smplx_joint_cam.copy()
+                joint_cam_wo_ra[self.smpl_x.joint_part['lhand'], :] = joint_cam_wo_ra[self.smpl_x.joint_part['lhand'], :] \
+                                                                + joint_cam_wo_ra[self.smpl_x.lwrist_idx, None, :]  # left hand root-relative
+                joint_cam_wo_ra[self.smpl_x.joint_part['rhand'], :] = joint_cam_wo_ra[self.smpl_x.joint_part['rhand'], :] \
+                                                                + joint_cam_wo_ra[self.smpl_x.rwrist_idx, None, :]  # right hand root-relative
+                joint_cam_wo_ra[self.smpl_x.joint_part['face'], :] = joint_cam_wo_ra[self.smpl_x.joint_part['face'], :] \
+                                                                + joint_cam_wo_ra[self.smpl_x.neck_idx, None,: ]  # face root-relative
+            # change smplx_shape if use_betas_neutral
+            # processing follows that in process_human_model_output
+            if self.use_betas_neutral:
+                smplx_shape = smplx_param['betas_neutral'].reshape(1, -1)
+                smplx_shape[(np.abs(smplx_shape) > 3).any(axis=1)] = 0.
+                smplx_shape = smplx_shape.reshape(-1)
+            # SMPLX pose parameter validity
+            smplx_pose_valid = np.tile(smplx_pose_valid[:, None], (1, 9)).reshape(-1)
+            smplx_joint_valid = smplx_joint_valid[:, None]
+            smplx_joint_trunc = smplx_joint_valid * smplx_joint_trunc
+            if not (smplx_shape == 0).all():
+                smplx_shape_valid = True
+            else:
+                smplx_shape_valid = False
+            # hand and face bbox transform
+            lhand_bbox, lhand_bbox_valid = self.process_hand_face_bbox(data['lhand_bbox'], do_flip, img_shape, img2bb_trans,
+                                                            self.cfg.model.input_img_shape, self.cfg.model.output_hm_shape)
+            rhand_bbox, rhand_bbox_valid = self.process_hand_face_bbox(data['rhand_bbox'], do_flip, img_shape, img2bb_trans,
+                                                            self.cfg.model.input_img_shape, self.cfg.model.output_hm_shape)
+            face_bbox, face_bbox_valid = self.process_hand_face_bbox(data['face_bbox'], do_flip, img_shape, img2bb_trans,
+                                                            self.cfg.model.input_img_shape, self.cfg.model.output_hm_shape)
+            if do_flip:
+                lhand_bbox, rhand_bbox = rhand_bbox, lhand_bbox
+                lhand_bbox_valid, rhand_bbox_valid = rhand_bbox_valid, lhand_bbox_valid
+            lhand_bbox_center = (lhand_bbox[0] + lhand_bbox[1]) / 2.
+            rhand_bbox_center = (rhand_bbox[0] + rhand_bbox[1]) / 2.
+            face_bbox_center = (face_bbox[0] + face_bbox[1]) / 2.
+            lhand_bbox_size = lhand_bbox[1] - lhand_bbox[0]
+            rhand_bbox_size = rhand_bbox[1] - rhand_bbox[0]
+            face_bbox_size = face_bbox[1] - face_bbox[0]
+            joint_img_aug = np.nan_to_num(joint_img_aug, nan=0.0)
+            smplx_pose = np.nan_to_num(smplx_pose, nan=0.0)
+            joint_cam_wo_ra = np.nan_to_num(joint_cam_wo_ra, nan=0.0)
+            joint_cam_ra = np.nan_to_num(joint_cam_ra, nan=0.0)
+            smplx_cam_trans = np.array(smplx_param['trans']) if 'trans' in smplx_param else None
+            inputs = {'img': img}
+            targets = {'joint_img': joint_img_aug, # keypoints2d
+                       'smplx_joint_img': joint_img_aug, #smplx_joint_img, # projected smplx if valid cam_param, else same as keypoints2d
+                       'joint_cam': joint_cam_wo_ra, # joint_cam actually not used in any loss, # raw kps3d probably without ra
+                       'smplx_joint_cam': joint_cam_ra, # kps3d with body, face, hand ra # smplx_joint_cam if (dummy_cord or getattr(cfg, 'debug', False)) else
+                       'smplx_pose': smplx_pose,
+                       'smplx_shape': smplx_shape,
+                       'smplx_expr': smplx_expr,
+                       'lhand_bbox_center': lhand_bbox_center, 'lhand_bbox_size': lhand_bbox_size,
+                       'rhand_bbox_center': rhand_bbox_center, 'rhand_bbox_size': rhand_bbox_size,
+                       'face_bbox_center': face_bbox_center, 'face_bbox_size': face_bbox_size,
+                       'lhand_root': smplx_param['lhand_root'] if 'lhand_root' in smplx_param else np.zeros((1, 3)),
+                       'rhand_root': smplx_param['rhand_root'] if 'rhand_root' in smplx_param else np.zeros((1, 3)),
+                       'smplx_cam_trans': smplx_cam_trans}
+            meta_info = {'joint_valid': joint_valid,
+                         'joint_trunc': joint_trunc,
+                         'smplx_joint_valid': smplx_joint_valid if dummy_cord else joint_valid,
+                         'smplx_joint_trunc': smplx_joint_trunc if dummy_cord else joint_trunc,
+                         'smplx_pose_valid': smplx_pose_valid,
+                         'smplx_shape_valid': float(smplx_shape_valid),
+                         'smplx_expr_valid': float(smplx_expr_valid),
+                         'is_3D': float(False) if dummy_cord else float(True),
+                         'lhand_bbox_valid': lhand_bbox_valid,
+                         'rhand_bbox_valid': rhand_bbox_valid, 'face_bbox_valid': face_bbox_valid,
+                         }
+            return inputs, targets, meta_info
+        # test
+        else:
+            joint_cam = data['joint_cam']
+            if joint_cam is not None:
+                dummy_cord = False
+                joint_cam = joint_cam - joint_cam[self.joint_set['root_joint_idx'], None, :]  # root-relative
+            else:
+                # dummy cord as joint_cam
+                dummy_cord = True
+                joint_cam = np.zeros((self.joint_set['joint_num'], 3), dtype=np.float32)
+            joint_img = data['joint_img']
+            joint_img = np.concatenate((joint_img[:, :2], joint_cam[:, 2:]), 1)  # x, y, depth
+            if not dummy_cord:
+                joint_img[:, 2] = (joint_img[:, 2] / (self.cfg.model.body_3d_size / 2) + 1) / 2. * self.cfg.model.output_hm_shape[0]  # discretize depth
+            joint_img, joint_cam, joint_cam_ra, joint_valid, joint_trunc = process_db_coord(
+                                                        joint_img=joint_img,
+                                                        joint_cam=joint_cam,
+                                                        joint_valid=data['joint_valid'],
+                                                        do_flip=do_flip,
+                                                        img_shape=img_shape,
+                                                        flip_pairs=self.joint_set['flip_pairs'],
+                                                        img2bb_trans=img2bb_trans,
+                                                        rot=rot,
+                                                        src_joints_name=self.joint_set['joints_name'],
+                                                        target_joints_name=self.smpl_x.joints_name,
+                                                        input_img_shape=self.cfg.model.input_img_shape,
+                                                        output_hm_shape=self.cfg.model.output_hm_shape,
+                                                        input_body_shape=self.cfg.model.input_body_shape)
+            # smplx coordinates and parameters
+            smplx_param = data['smplx_param']
+            smplx_cam_trans = np.array(smplx_param['trans']) if 'trans' in smplx_param else None
+            model_type = data['model']
+            if model_type == 'smplx':
+                smplx_joint_img, smplx_joint_cam, smplx_joint_trunc, smplx_pose, smplx_shape, smplx_expr, \
+                smplx_pose_valid, smplx_joint_valid, \
+                    smplx_expr_valid, smplx_mesh_cam_orig = process_human_model_output(
+                                                                human_model_param=smplx_param,
+                                                                cam_param=self.cam_param,
+                                                                do_flip=do_flip,
+                                                                img_shape=img_shape,
+                                                                img2bb_trans=img2bb_trans,
+                                                                rot=rot,
+                                                                human_model_type=model_type,
+                                                                joint_img=None if self.cam_param else joint_img,
+                                                                body_3d_size=self.cfg.model.body_3d_size,
+                                                                hand_3d_size=self.cfg.model.hand_3d_size,
+                                                                face_3d_size=self.cfg.model.face_3d_size,
+                                                                input_img_shape=self.cfg.model.input_img_shape,
+                                                                output_hm_shape=self.cfg.model.output_hm_shape,
+                                                                )
+                smplx_pose_valid = np.tile(smplx_pose_valid[:, None], (1, 9)).reshape(-1)
+            elif model_type == 'smpl':
+                    _, _, _, _, _, smplx_mesh_cam_orig = process_human_model_output(
+                                                                human_model_param=smplx_param,
+                                                                cam_param=self.cam_param,
+                                                                do_flip=do_flip,
+                                                                img_shape=img_shape,
+                                                                img2bb_trans=img2bb_trans,
+                                                                rot=rot,
+                                                                human_model_type=model_type,
+                                                                joint_img=None if self.cam_param else joint_img,
+                                                                body_3d_size=self.cfg.model.body_3d_size,
+                                                                hand_3d_size=self.cfg.model.hand_3d_size,
+                                                                face_3d_size=self.cfg.model.face_3d_size,
+                                                                input_img_shape=self.cfg.model.input_img_shape,
+                                                                output_hm_shape=self.cfg.model.output_hm_shape,
+                                                                )
+            lhand_valid = 1.0
+            rhand_valid = 1.0
+            # process the hand mesh for mano dataset
+            if self.__class__.__name__ in ['FreiHand', 'InterHand', 'BlurHand', 'HanCo']:
+                if (data['smplx_param']['lhand_root']==0).all():
+                    lhand_valid = 0.0
+                if (data['smplx_param']['rhand_root']==0).all():
+                    rhand_valid = 0.0
+                # build smplx but redo the hand rotation with global orientation
+                smplx_pose_rotmat = batch_rodrigues(torch.Tensor(smplx_pose.reshape(-1,3))).reshape(smplx_pose.shape[0], -1)
+                # redo the hand oration: R_gt x R_inv x hand mesh
+                R_gt_l = data['smplx_param']['lhand_root'] if 'lhand_root' in smplx_param else np.zeros((1, 3))
+                R_gt_r = data['smplx_param']['rhand_root'] if 'rhand_root' in smplx_param else np.zeros((1, 3))
+                R_gt_l = batch_rodrigues(torch.Tensor(R_gt_l.reshape(-1,3))).reshape(R_gt_l.shape[0], 3, 3)
+                R_gt_r = batch_rodrigues(torch.Tensor(R_gt_r.reshape(-1,3))).reshape(R_gt_r.shape[0], 3, 3)
+                # import pdb; pdb.set_trace()
+                # get hand mesh with wrong global orientation
+                lhand_mesh = smplx_mesh_cam_orig[self.smpl_x.hand_vertex_idx['left_hand'], :]
+                rhand_mesh = smplx_mesh_cam_orig[self.smpl_x.hand_vertex_idx['right_hand'], :]
+                # get wrist offset and align hand mesh to pelvis
+                lwrist_offset = np.dot(self.smpl_x.J_regressor, smplx_mesh_cam_orig)[self.smpl_x.J_regressor_idx['lwrist'], None, :]
+                rwrist_offset = np.dot(self.smpl_x.J_regressor, smplx_mesh_cam_orig)[self.smpl_x.J_regressor_idx['rwrist'], None, :]
+                mesh_out_lhand_align = lhand_mesh - lwrist_offset
+                mesh_out_rhand_align = rhand_mesh - rwrist_offset
+                # redo the rotation and align to wrist position world->cam
+                R_gt_l = np.dot(data['extrinsic_r'], R_gt_l.squeeze())
+                R_gt_r = np.dot(data['extrinsic_r'], R_gt_r.squeeze())
+                mesh_global_lhand = np.dot(R_gt_l, mesh_out_lhand_align.T).T #+ lwrist_offset
+                mesh_global_rhand = np.dot(R_gt_r, mesh_out_rhand_align.T).T #+ rwrist_offset
+                # replace hand mesh in smplx mesh
+                smplx_mesh_cam_orig[self.smpl_x.hand_vertex_idx['left_hand'], :] = mesh_global_lhand
+                smplx_mesh_cam_orig[self.smpl_x.hand_vertex_idx['right_hand'], :] = mesh_global_rhand
+            if self.__class__.__name__ in ['ARCTIC'] and (data['vertices3d'] != -1).all():
+                smplx_mesh_cam_orig = data['vertices3d']
+            data['joint_cam'][self.smpl_x.joint_part['lhand'], :] = (data['joint_cam'][self.smpl_x.joint_part['lhand'], :] - \
+                data['joint_cam'][self.smpl_x.lwrist_idx, None,:]) * lhand_valid# left hand root-relative
+            data['joint_cam'][self.smpl_x.joint_part['rhand'], :] = (data['joint_cam'][self.smpl_x.joint_part['rhand'], :] - \
+                data['joint_cam'][self.smpl_x.rwrist_idx, None,:]) * rhand_valid
+            inputs = {'img': img}
+            targets = {'smplx_cam_trans' : smplx_cam_trans,
+                    'smplx_mesh_cam': smplx_mesh_cam_orig,
+                    'joint_cam': data['joint_cam'],}
+            meta_info = {'bb2img_trans': bb2img_trans,
+                        'gt_smplx_transl':smplx_cam_trans,
+                        'lhand_valid': lhand_valid,
+                        'rhand_valid': rhand_valid,
+                        'focal': focal, 'principal_pt': princpt,
+                        'img_id': data['idx']}
+            return inputs, targets, meta_info
+    def process_hand_face_bbox(self, bbox, do_flip, img_shape, img2bb_trans, input_img_shape, output_hm_shape):
+        if bbox is None:
+            bbox = np.array([0, 0, 1, 1], dtype=np.float32).reshape(2, 2)  # dummy value
+            bbox_valid = float(False)  # dummy value
+        else:
+            # reshape to top-left (x,y) and bottom-right (x,y)
+            bbox = bbox.reshape(2, 2)
+            # flip augmentation
+            if do_flip:
+                bbox[:, 0] = img_shape[1] - bbox[:, 0] - 1
+                bbox[0, 0], bbox[1, 0] = bbox[1, 0].copy(), bbox[0, 0].copy()  # xmin <-> xmax swap
+            # make four points of the bbox
+            bbox = bbox.reshape(4).tolist()
+            xmin, ymin, xmax, ymax = bbox
+            bbox = np.array([[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]], dtype=np.float32).reshape(4, 2)
+            # affine transformation (crop, rotation, scale)
+            bbox_xy1 = np.concatenate((bbox, np.ones_like(bbox[:, :1])), 1)
+            bbox = np.dot(img2bb_trans, bbox_xy1.transpose(1, 0)).transpose(1, 0)[:, :2]
+            bbox[:, 0] = bbox[:, 0] / input_img_shape[1] * output_hm_shape[2]
+            bbox[:, 1] = bbox[:, 1] / input_img_shape[0] * output_hm_shape[1]
+            # make box a rectangle without rotation
+            xmin = np.min(bbox[:, 0])
+            xmax = np.max(bbox[:, 0])
+            ymin = np.min(bbox[:, 1])
+            ymax = np.max(bbox[:, 1])
+            bbox = np.array([xmin, ymin, xmax, ymax], dtype=np.float32)
+            bbox_valid = float(True)
+            bbox = bbox.reshape(2, 2)
+        return bbox, bbox_valid
+    def evaluate(self, outs, cur_sample_idx=None):
+        sample_num = len(outs)
+        eval_result = {'pa_mpvpe_all': [], 'pa_mpvpe_l_hand': [], 'pa_mpvpe_r_hand': [], 'pa_mpvpe_hand': [], 'pa_mpvpe_face': [],
+                       'mpvpe_all': [], 'mpvpe_l_hand': [], 'mpvpe_r_hand': [], 'mpvpe_hand': [], 'mpvpe_face': [],
+                       'pa_mpjpe_body': [], 'pa_mpjpe_l_hand': [], 'pa_mpjpe_r_hand': [], 'pa_mpjpe_hand': [],
+                       'mpjpe_body':[], 'mpjpe_l_hand': [], 'mpjpe_r_hand': [], 'mpjpe_hand': [],}
+        for n in range(sample_num):
+            out = outs[n]
+            mesh_gt = out['smplx_mesh_cam_pseudo_gt']
+            mesh_out = out['smplx_mesh_cam']
+            if mesh_gt.shape[0] == 6890:
+                face = self.smpl.face
+                # root align -> ds (better for pve and mpjpe)
+                mesh_out_root_align = mesh_out - np.dot(self.smpl_x.J_regressor, mesh_out)[self.smpl_x.J_regressor_idx['pelvis'], None,
+                                            :] + np.dot(self.smpl.joint_regressor, mesh_gt)[self.smpl.orig_root_joint_idx, None,:]
+                mesh_out_root_align = np.matmul(self.downsample_mat, mesh_out_root_align)
+                # PVE from body
+                mpvpe_all = np.sqrt(np.sum((mesh_out_root_align - mesh_gt) ** 2, 1)).mean() * 1000
+                eval_result['mpvpe_all'].append(mpvpe_all)
+                mesh_out_pa_align = rigid_align(mesh_out_root_align, mesh_gt)
+                pa_mpvpe_all = np.sqrt(np.sum((mesh_out_pa_align - mesh_gt) ** 2, 1)).mean() * 1000
+                eval_result['pa_mpvpe_all'].append(pa_mpvpe_all)
+                # MPJPE from body joints
+                joint_gt_body = np.dot(self.smpl.joint_regressor, mesh_gt)[LSP_MAPPIMG, :]
+                joint_out_body_root_align = np.dot(self.smpl.joint_regressor, mesh_out_root_align)[LSP_MAPPIMG, :]
+                joint_out_body_pa_align = rigid_align(joint_out_body_root_align, joint_gt_body)
+                eval_result['mpjpe_body'].append(
+                np.sqrt(np.sum((joint_out_body_root_align - joint_gt_body) ** 2, 1)).mean() * 1000)
+                eval_result['pa_mpjpe_body'].append(
+                np.sqrt(np.sum((joint_out_body_pa_align - joint_gt_body) ** 2, 1)).mean() * 1000)
+            else:
+                # MPVPE from all vertices
+                mesh_out_align = mesh_out - np.dot(self.smpl_x.J_regressor, mesh_out)[self.smpl_x.J_regressor_idx['pelvis'], None,
+                                            :] + np.dot(self.smpl_x.J_regressor, mesh_gt)[self.smpl_x.J_regressor_idx['pelvis'], None, :]
+                joint_out_body_root_align = np.dot(self.smpl_x.j14_regressor, mesh_out_align)
+                mpvpe_all = np.sqrt(np.sum((mesh_out_align - mesh_gt) ** 2, 1)).mean() * 1000
+                eval_result['mpvpe_all'].append(mpvpe_all)
+                mesh_out_align = rigid_align(mesh_out, mesh_gt)
+                pa_mpvpe_all = np.sqrt(np.sum((mesh_out_align - mesh_gt) ** 2, 1)).mean() * 1000
+                eval_result['pa_mpvpe_all'].append(pa_mpvpe_all)
+                mesh_gt_lhand = mesh_gt[self.smpl_x.hand_vertex_idx['left_hand'], :] - np.dot(
+                    self.smpl_x.J_regressor, mesh_gt)[self.smpl_x.J_regressor_idx['lwrist'], None, :]
+                mesh_gt_rhand = mesh_gt[self.smpl_x.hand_vertex_idx['right_hand'], :] - np.dot(
+                    self.smpl_x.J_regressor, mesh_gt)[self.smpl_x.J_regressor_idx['rwrist'], None, :]
+                mesh_out_lhand = mesh_out[self.smpl_x.hand_vertex_idx['left_hand'], :]
+                mesh_out_rhand = mesh_out[self.smpl_x.hand_vertex_idx['right_hand'], :]
+                mesh_out_lhand_align = mesh_out_lhand - np.dot(self.smpl_x.J_regressor, mesh_out)[
+                                                        self.smpl_x.J_regressor_idx['lwrist'], None, :]
+                mesh_out_rhand_align = mesh_out_rhand - np.dot(self.smpl_x.J_regressor, mesh_out)[
+                                                        self.smpl_x.J_regressor_idx['rwrist'], None, :]
+                if out['lhand_valid']:
+                    eval_result['mpvpe_l_hand'].append(np.sqrt(
+                        np.sum((mesh_out_lhand_align - mesh_gt_lhand) ** 2, 1)).mean() * 1000)
+                if out['rhand_valid']:
+                    eval_result['mpvpe_r_hand'].append(np.sqrt(
+                        np.sum((mesh_out_rhand_align - mesh_gt_rhand) ** 2, 1)).mean() * 1000)
+                hand_mpve_all = (np.sqrt(
+                    np.sum((mesh_out_lhand_align - mesh_gt_lhand) ** 2, 1)).mean() * 1000 * out['lhand_valid'] + np.sqrt(
+                    np.sum((mesh_out_rhand_align - mesh_gt_rhand) ** 2, 1)).mean() * 1000 * out['rhand_valid']
+                    ) / (out['lhand_valid'] + out['rhand_valid'])
+                eval_result['mpvpe_hand'].append(hand_mpve_all)
+                mesh_out_lhand_align = rigid_align(mesh_out_lhand, mesh_gt_lhand)
+                mesh_out_rhand_align = rigid_align(mesh_out_rhand, mesh_gt_rhand)
+                if out['lhand_valid']:
+                    eval_result['pa_mpvpe_l_hand'].append(np.sqrt(
+                        np.sum((mesh_out_lhand_align - mesh_gt_lhand) ** 2, 1)).mean() * 1000)
+                if out['rhand_valid']:
+                    eval_result['pa_mpvpe_r_hand'].append(np.sqrt(
+                        np.sum((mesh_out_rhand_align - mesh_gt_rhand) ** 2, 1)).mean() * 1000)
+                eval_result['pa_mpvpe_hand'].append((np.sqrt(
+                    np.sum((mesh_out_lhand_align - mesh_gt_lhand) ** 2, 1)).mean() * 1000 * out['lhand_valid'] + np.sqrt(
+                    np.sum((mesh_out_rhand_align - mesh_gt_rhand) ** 2, 1)).mean() * 1000 * out['rhand_valid']) /
+                    (out['lhand_valid'] + out['rhand_valid']))
+                # MPVPE from face vertices
+                mesh_gt_face = mesh_gt[self.smpl_x.face_vertex_idx, :]
+                mesh_out_face = mesh_out[self.smpl_x.face_vertex_idx, :]
+                mesh_out_face_align = mesh_out_face - np.dot(self.smpl_x.J_regressor, mesh_out)[self.smpl_x.J_regressor_idx['neck'],
+                                                    None, :] + np.dot(self.smpl_x.J_regressor, mesh_gt)[
+                                                                self.smpl_x.J_regressor_idx['neck'], None, :]
+                eval_result['mpvpe_face'].append(
+                    np.sqrt(np.sum((mesh_out_face_align - mesh_gt_face) ** 2, 1)).mean() * 1000)
+                mesh_out_face_align = rigid_align(mesh_out_face, mesh_gt_face)
+                eval_result['pa_mpvpe_face'].append(
+                    np.sqrt(np.sum((mesh_out_face_align - mesh_gt_face) ** 2, 1)).mean() * 1000)
+                joint_gt_body = np.dot(self.smpl_x.j14_regressor, mesh_gt)
+                joint_out_body = np.dot(self.smpl_x.j14_regressor, mesh_out)
+                joint_out_body_align = rigid_align(joint_out_body, joint_gt_body)
+                eval_result['mpjpe_body'].append(
+                    np.sqrt(np.sum((joint_out_body_root_align - joint_gt_body) ** 2, 1)).mean() * 1000)
+                eval_result['pa_mpjpe_body'].append(
+                    np.sqrt(np.sum((joint_out_body_align - joint_gt_body) ** 2, 1)).mean() * 1000)
+                joint_gt_lhand = np.dot(self.smpl_x.orig_hand_regressor['left'], mesh_gt)[1:]
+                joint_gt_rhand = np.dot(self.smpl_x.orig_hand_regressor['right'], mesh_gt)[1:]
+                joint_out_lhand = np.dot(self.smpl_x.orig_hand_regressor['left'], mesh_out)[1:] - np.dot(self.smpl_x.J_regressor, mesh_out)[
+                                                        self.smpl_x.J_regressor_idx['lwrist'], None, :]
+                joint_out_rhand = np.dot(self.smpl_x.orig_hand_regressor['right'], mesh_out)[1:] - np.dot(self.smpl_x.J_regressor, mesh_out)[
+                                                        self.smpl_x.J_regressor_idx['rwrist'], None, :]
+                joint_out_lhand_align = rigid_align(joint_out_lhand, joint_gt_lhand)
+                joint_out_rhand_align = rigid_align(joint_out_rhand, joint_gt_rhand)
+                if out['lhand_valid']:
+                    eval_result['mpjpe_l_hand'].append(np.sqrt(
+                        np.sum((joint_out_lhand - joint_gt_lhand) ** 2, 1)).mean() * 1000)
+                if out['rhand_valid']:
+                    eval_result['mpjpe_r_hand'].append(np.sqrt(
+                        np.sum((joint_out_rhand - joint_gt_rhand) ** 2, 1)).mean() * 1000)
+                hand_pa_mpve_all = (np.sqrt(
+                    np.sum((joint_out_lhand - joint_gt_lhand) ** 2, 1)).mean() * 1000 * out['lhand_valid'] + np.sqrt(
+                    np.sum((joint_out_rhand - joint_gt_rhand) ** 2, 1)).mean() * 1000 * out['rhand_valid']
+                    ) / (out['lhand_valid'] + out['rhand_valid'])
+                eval_result['mpjpe_hand'].append(hand_pa_mpve_all)
+                if out['lhand_valid']:
+                    value = np.sqrt(np.sum((joint_out_lhand_align - joint_gt_lhand) ** 2, 1)).mean() * 1000
+                    if value < 100:
+                        eval_result['pa_mpjpe_l_hand'].append(value)
+                    if value > 100:
+                        print("lhand:",value)
+                        continue
+                if out['rhand_valid']:
+                    value = np.sqrt(np.sum((joint_out_rhand_align - joint_gt_rhand) ** 2, 1)).mean() * 1000
+                    if value < 100:
+                        eval_result['pa_mpjpe_r_hand'].append(value)
+                    if value > 100:
+                        print("rhand:",value)
+                        continue
+                eval_result['pa_mpjpe_hand'].append((np.sqrt(
+                    np.sum((joint_out_lhand_align - joint_gt_lhand) ** 2, 1)).mean() * 1000 * out['lhand_valid'] + np.sqrt(
+                    np.sum((joint_out_rhand_align - joint_gt_rhand) ** 2, 1)).mean() * 1000 * out['rhand_valid']
+                    ) / (out['lhand_valid'] + out['rhand_valid']))
+        return eval_result
+    def print_eval_result(self, eval_result):
+        print(f'======{self.cfg.data.testset}======')
+        print('PA MPVPE (All): %.2f mm' % np.mean(eval_result['pa_mpvpe_all']))
+        print('PA MPVPE (L-Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_l_hand']))
+        print('PA MPVPE (R-Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_r_hand']))
+        print('PA MPVPE (Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_hand']))
+        print('PA MPVPE (Face): %.2f mm' % np.mean(eval_result['pa_mpvpe_face']))
+        print()
+        print('MPVPE (All): %.2f mm' % np.mean(eval_result['mpvpe_all']))
+        print('MPVPE (L-Hands): %.2f mm' % np.mean(eval_result['mpvpe_l_hand']))
+        print('MPVPE (R-Hands): %.2f mm' % np.mean(eval_result['mpvpe_r_hand']))
+        print('MPVPE (Hands): %.2f mm' % np.mean(eval_result['mpvpe_hand']))
+        print('MPVPE (Face): %.2f mm' % np.mean(eval_result['mpvpe_face']))
+        print()
+        print('PA MPJPE (Body): %.2f mm' % np.mean(eval_result['pa_mpjpe_body']))
+        print('PA MPJPE (L-Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_l_hand']))
+        print('PA MPJPE (R-Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_r_hand']))
+        print('PA MPJPE (Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_hand']))
+        print()
+        print('MPJPE (Body): %.2f mm' % np.mean(eval_result['mpjpe_body']))
+        print('MPJPE (L-Hands): %.2f mm' % np.mean(eval_result['mpjpe_l_hand']))
+        print('MPJPE (R-Hands): %.2f mm' % np.mean(eval_result['mpjpe_r_hand']))
+        print('MPJPE (Hands): %.2f mm' % np.mean(eval_result['mpjpe_hand']))
+        print()
+        print(f"{np.mean(eval_result['pa_mpvpe_all'])},{np.mean(eval_result['pa_mpvpe_l_hand'])},{np.mean(eval_result['pa_mpvpe_r_hand'])},{np.mean(eval_result['pa_mpvpe_hand'])},{np.mean(eval_result['pa_mpvpe_face'])},"
+        f"{np.mean(eval_result['mpvpe_all'])},{np.mean(eval_result['mpvpe_l_hand'])},{np.mean(eval_result['mpvpe_r_hand'])},{np.mean(eval_result['mpvpe_hand'])},{np.mean(eval_result['mpvpe_face'])},"
+        f"{np.mean(eval_result['pa_mpjpe_body'])},{np.mean(eval_result['pa_mpjpe_l_hand'])},{np.mean(eval_result['pa_mpjpe_r_hand'])},{np.mean(eval_result['pa_mpjpe_hand'])}")
+        print()
+        f = open(os.path.join(self.cfg.log.result_dir, 'result.txt'), 'w')
+        f.write(f'{self.cfg.data.testset} dataset \n')
+        f.write('PA MPVPE (All): %.2f mm\n' % np.mean(eval_result['pa_mpvpe_all']))
+        f.write('PA MPVPE (L-Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_l_hand']))
+        f.write('PA MPVPE (R-Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_r_hand']))
+        f.write('PA MPVPE (Hands): %.2f mm\n' % np.mean(eval_result['pa_mpvpe_hand']))
+        f.write('PA MPVPE (Face): %.2f mm\n' % np.mean(eval_result['pa_mpvpe_face']))
+        f.write('MPVPE (All): %.2f mm\n' % np.mean(eval_result['mpvpe_all']))
+        f.write('MPVPE (L-Hands): %.2f mm' % np.mean(eval_result['mpvpe_l_hand']))
+        f.write('MPVPE (R-Hands): %.2f mm' % np.mean(eval_result['mpvpe_r_hand']))
+        f.write('MPVPE (Hands): %.2f mm' % np.mean(eval_result['mpvpe_hand']))
+        f.write('MPVPE (Face): %.2f mm\n' % np.mean(eval_result['mpvpe_face']))
+        f.write('PA MPJPE (Body): %.2f mm\n' % np.mean(eval_result['pa_mpjpe_body']))
+        f.write('PA MPJPE (L-Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_l_hand']))
+        f.write('PA MPJPE (R-Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_r_hand']))
+        f.write('PA MPJPE (Hands): %.2f mm\n' % np.mean(eval_result['pa_mpjpe_hand']))
+        f.write(f"{np.mean(eval_result['pa_mpvpe_all'])},{np.mean(eval_result['pa_mpvpe_l_hand'])},{np.mean(eval_result['pa_mpvpe_r_hand'])},{np.mean(eval_result['pa_mpvpe_hand'])},{np.mean(eval_result['pa_mpvpe_face'])},"
+        f"{np.mean(eval_result['mpvpe_all'])},{np.mean(eval_result['mpvpe_l_hand'])},{np.mean(eval_result['mpvpe_r_hand'])},{np.mean(eval_result['mpvpe_hand'])},{np.mean(eval_result['mpvpe_face'])},"
+        f"{np.mean(eval_result['pa_mpjpe_body'])},{np.mean(eval_result['pa_mpjpe_l_hand'])},{np.mean(eval_result['pa_mpjpe_r_hand'])},{np.mean(eval_result['pa_mpjpe_hand'])}")
+        f.close()
+    def decompress_keypoints(self, humandata) -> None:
+        """If a key contains 'keypoints', and f'{key}_mask' is in self.keys(),
+        invalid zeros will be inserted to the right places and f'{key}_mask'
+        will be unlocked.
+        Raises:
+            KeyError:
+                A key contains 'keypoints' has been found
+                but its corresponding mask is missing.
+        """
+        assert bool(humandata['__keypoints_compressed__']) is True
+        key_pairs = []
+        for key in humandata.files:
+            if key not in KPS2D_KEYS + KPS3D_KEYS:
+                continue
+            mask_key = f'{key}_mask'
+            if mask_key in humandata.files:
+                print(f'Decompress {key}...')
+                key_pairs.append([key, mask_key])
+        decompressed_dict = {}
+        for kpt_key, mask_key in key_pairs:
+            mask_array = np.asarray(humandata[mask_key])
+            compressed_kpt = humandata[kpt_key]
+            kpt_array = \
+                self.add_zero_pad(compressed_kpt, mask_array)
+            decompressed_dict[kpt_key] = kpt_array
+        del humandata
+        return decompressed_dict
+    def add_zero_pad(self, compressed_array: np.ndarray,
+                         mask_array: np.ndarray) -> np.ndarray:
+        """Pad zeros to a compressed keypoints array.
+        Args:
+            compressed_array (np.ndarray):
+                A compressed keypoints array.
+            mask_array (np.ndarray):
+                The mask records compression relationship.
+        Returns:
+            np.ndarray:
+                A keypoints array in full-size.
+        """
+        if compressed_array.shape[1] == mask_array.shape[0]:
+            print("No need to decompress")
+            return compressed_array
+        else:
+            assert mask_array.sum() == compressed_array.shape[1]
+            data_len, _, dim = compressed_array.shape
+            mask_len = mask_array.shape[0]
+            ret_value = np.zeros(
+                shape=[data_len, mask_len, dim], dtype=compressed_array.dtype)
+            valid_mask_index = np.where(mask_array == 1)[0]
+            ret_value[:, valid_mask_index, :] = compressed_array
+            return ret_value

SMPLest-X/humandata_prep/README.md ADDED Viewed

	@@ -0,0 +1,64 @@

+Guide to HumanData and View tools
+========================
+## What is HumanData?
+HumanData is designed to provide a unified format for SMPL/SMPLX datasets to support joint training and evaluation.
+The project is maintained in MMHuman3D.
+See [detailed info](https://github.com/open-mmlab/mmhuman3d/blob/convertors/docs/human_data.md) for data structure and sample usage.
+If you want to create your own humandata file, please refer to the sample below and maintain the similiar structure. Basically it is a big dictionary with some lists or dicts of lists, any dict with the correct structure works (Not necesscarily in `HumanData` class).
+## Sample Visualization Script
+We provide a simple script to check the annotation and visualize the results. The script will read the annotation from HumanData and render it on the corresponding image using pyrender.
+### Download
+Download sample here: [Hugging Face](https://huggingface.co/waanqii/SMPLest-X/resolve/main/hd_sample_humandata.zip?download=true)
+### Extract
+Follow the file structure as in main page. Extract to `data` folder,  the structure should look like this:
+```
+├── data
+│   ├── annot
+│   │    └── hd_10sample.npz # sample annotation
+│   └── img # original data files
+│        └── egocentric_color
+```
+### Environment
+Basically you can directly install pyrender and trimesh to your environment, I tested many platforms without finding confilcts.
+CPU version of pytorch is also supported.
+```
+conda create -n hd_vis python=3.9
+conda activate hd_vis
+conda install torch torchvision torchaudio cudatoolkit=11.3 -c pytorch
+pip install pyrender trimesh numpy opencv-python tqdm smplx
+```
+### Visualization
+Fixed command to for demo sample.
+```
+python humandata_prep/check.py \
+    --hd_path data/annot/hd_10sample.npz \
+    --image_folder data/img \
+    --output_folder data/vis_output \
+    --body_model_path human_models/human_model_files
+```
+- Rendered image will be saved in the output folder.
+## Important Points: when visualizing other humandata files
+This section is for those who want to debug or create their own humandata files.
+- Check `flat_hand_mean` if is correctly set, for humandata, it shoule be specified in `hd['misc']['flat_hand_mean']` or by default `False`
+- Check `gender`
+- For some specific datasets, they might provide mesh vertices instead of SMPL/SMPLX parameters, we suggest to fit the mesh to parameters for every instance to maintain the consistency of the visualization. Some of those datasets are:
+    - Arctic: They provide `vtemplate` instead of `betas`
+    - EHF: They provide mesh files
+- Standalone [SMPLX parameters fitting script](https://github.com/open-mmlab/mmhuman3d/blob/convertors/tools/preprocess/fit_shape2smplx.py)

SMPLest-X/humandata_prep/check.py ADDED Viewed

	@@ -0,0 +1,298 @@

+import numpy as np
+import random
+import cv2
+import os
+import argparse
+import torch
+import pyrender
+import trimesh
+import smplx
+from tqdm import tqdm
+# for visualizing and checking purpose, no need jaw, eye pose
+smpl_shape = {'betas': (-1, 10), 'transl': (-1, 3), 'global_orient': (-1, 3), 'body_pose': (-1, 69)}
+smplx_shape = {'betas': (-1, 10), 'transl': (-1, 3), 'global_orient': (-1, 3),
+        'body_pose': (-1, 21, 3), 'left_hand_pose': (-1, 15, 3), 'right_hand_pose': (-1, 15, 3)}
+def get_cam_params(param, idx):
+    '''
+        Read camera parameters from humandata
+        Input:
+        Output:
+    '''
+    R, T = None, None
+    # read cam params
+    try:
+        focal_length = param['meta'].item()['focal_length'][idx]
+        camera_center = param['meta'].item()['principal_point'][idx]
+    except TypeError:
+        focal_length = param['meta'].item()['focal_length']
+        camera_center = param['meta'].item()['princpt']
+    try:
+        R = param['meta'].item()['R'][idx]
+        T = param['meta'].item()['T'][idx]
+    except KeyError:
+        R = None
+        T = None
+    except IndexError:
+        R = None
+        T = None
+    focal_length = np.asarray(focal_length).reshape(-1)
+    camera_center = np.asarray(camera_center).reshape(-1)
+    if len(focal_length)==1:
+        focal_length = [focal_length, focal_length]
+    if len(camera_center)==1:
+        camera_center = [camera_center, camera_center]
+    return focal_length, camera_center, R, T
+def render_pose(img, body_model_param, body_model, camera, return_mask=False,
+                 R=None, T=None):
+    # the inverse is same
+    pyrender2opencv = np.array([[1.0, 0, 0, 0],
+                                [0, -1, 0, 0],
+                                [0, 0, -1, 0],
+                                [0, 0, 0, 1]])
+    output = body_model(**body_model_param, return_verts=True)
+    faces = body_model.faces
+    vertices = output['vertices'].detach().cpu().numpy().squeeze()
+    # adjust vertices beased on R and T
+    if R is not None:
+        joints = output['joints'].detach().cpu().numpy().squeeze()
+        root_joints = joints[0]
+        verts_T = np.dot(np.array(R), root_joints) - root_joints  + np.array(T)
+        vertices = vertices + verts_T
+    # render material
+    base_color = (1.0, 193/255, 193/255, 1.0)
+    material = pyrender.MetallicRoughnessMaterial(
+            metallicFactor=0.3,
+            alphaMode='OPAQUE',
+            baseColorFactor=base_color)
+    # transfer to trimesh
+    body_trimesh = trimesh.Trimesh(vertices, faces, process=False)
+    body_mesh = pyrender.Mesh.from_trimesh(body_trimesh, material=material)
+    # prepare camera and light
+    light = pyrender.DirectionalLight(color=np.ones(3), intensity=2.0)
+    cam_pose = pyrender2opencv @ np.eye(4)
+    # build scene
+    scene = pyrender.Scene(bg_color=[0.0, 0.0, 0.0, 0.0],
+                                    ambient_light=(0.3, 0.3, 0.3))
+    scene.add(camera, pose=cam_pose)
+    scene.add(light, pose=cam_pose)
+    scene.add(body_mesh, 'mesh')
+    # render scene
+    # os.environ["PYOPENGL_PLATFORM"] = "osmesa" # include this line if use in vscode
+    r = pyrender.OffscreenRenderer(viewport_width=img.shape[1],
+                                    viewport_height=img.shape[0],
+                                    point_size=1.0)
+    #
+    color, _ = r.render(scene, flags=pyrender.RenderFlags.RGBA)
+    # depth = r.render(scene, flags=pyrender.RenderFlags.DEPTH_ONLY)
+    # normal, _ = r.render(scene, flags=pyrender.RenderFlags.FACE_NORMALS)
+    color = color.astype(np.float32) / 255.0
+    # depth = np.asarray(depth, dtype=np.float32)
+    # normal = np.asarray(normal, dtype=np.float32)
+    # set transparency in [0.0, 1.0]
+    alpha = 0.8
+    valid_mask = (color[:, :, -1] > 0)[:, :, np.newaxis]
+    valid_mask = valid_mask * alpha
+    img = img / 255
+    output_img = (color[:, :, :] * valid_mask + (1 - valid_mask) * img)
+    img = (output_img * 255).astype(np.uint8)
+    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+    if return_mask:
+        return img, valid_mask, (color * 255).astype(np.uint8)
+    return img
+def visualize_humandata(args):
+    '''
+    '''
+    # TODO: load from args.path
+    param = dict(np.load(args.hd_path, allow_pickle=True))
+    # check for annot and type
+    has_smplx, has_smpl, has_gender = False, False, False
+    if 'smpl' in param.keys():
+        has_smpl = True
+    elif 'smplx' in param.keys():
+        has_smplx = True
+    if 'meta' in param.keys():
+        if 'gender' in param['meta'].item().keys():
+            has_gender = True
+    assert has_smpl or has_smplx, 'No body model annotation found in the dataset'
+    # load params
+    if has_smpl:
+        body_model_param_smpl = param['smpl'].item()
+    if has_smplx:
+        body_model_param_smplx = param['smplx'].item()
+    # read smplx only if has both smpl and smplx
+    if has_smpl and has_smplx:
+        has_smpl = False
+    device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
+    flat_hand_mean = args.flat_hand_mean
+    if 'misc' in param.keys():
+        if 'flat_hand_mean' in param['misc'].item().keys():
+            flat_hand_mean = param['misc'].item()['flat_hand_mean']
+    # build smpl model TODO: args for model path
+    gendered_smpl = {}
+    for gender in ['male', 'female', 'neutral']:
+        kwargs_smpl = dict(
+            gender=gender,
+            num_betas=10,
+            use_face_contour=True,
+            use_pca=False,
+            batch_size=1)
+        gendered_smpl[gender] = smplx.create(
+            args.body_model_path, 'smpl',
+            **kwargs_smpl).to(device)
+    # build smplx model TODO: model path
+    gendered_smplx = {}
+    for gender in ['male', 'female', 'neutral']:
+        kwargs_smplx = dict(
+            gender=gender,
+            num_betas=10,
+            use_face_contour=True,
+            flat_hand_mean=flat_hand_mean,
+            use_pca=False,
+            batch_size=1)
+        gendered_smplx[gender] = smplx.create(
+            args.body_model_path, 'smplx',
+            **kwargs_smplx).to(device)
+    # for idx in idx_list:
+    sample_size = args.render_num
+    if sample_size > len(param['image_path']):
+        idxs = range(len(param['image_path']))
+    else:
+        idxs = random.sample(range(len(param['image_path'])), sample_size)
+    for idx in tqdm(sorted(idxs), desc=f'Processing npz {os.path.basename(args.hd_path)}, sample size: {sample_size}',
+                    position=0, leave=False):
+        # Load image
+        image_p = param['image_path'][idx]
+        image_path = os.path.join(args.image_folder, image_p)
+        image = cv2.imread(image_path)
+        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
+        # ---------------------- render single pose ------------------------
+        # read cam params
+        focal_length, camera_center, R, T = get_cam_params(param, idx)
+        # read gender
+        if has_gender:
+            try:
+                gender = param['meta'].item()['gender'][idx]
+            except IndexError:
+                gender = 'neutral'
+        else:
+            gender = 'neutral'
+        # prepare for mesh projection
+        camera = pyrender.camera.IntrinsicsCamera(
+            fx=focal_length[0], fy=focal_length[1],
+            cx=camera_center[0], cy=camera_center[1])
+        if has_smpl:
+            intersect_key = list(set(body_model_param_smpl.keys()) & set(smpl_shape.keys()))
+            body_model_param_tensor = {key: torch.tensor(
+                    np.array(body_model_param_smpl[key][idx:idx+1]).reshape(smpl_shape[key]),
+                            device=device, dtype=torch.float32)
+                            for key in intersect_key
+                            if len(body_model_param_smpl[key][idx:idx+1]) > 0}
+            rendered_image = render_pose(img=image,
+                                    body_model_param=body_model_param_tensor,
+                                    body_model=gendered_smpl[gender],
+                                    camera=camera,
+                                    R=R, T=T)
+        if has_smplx:
+            intersect_key = list(set(body_model_param_smplx.keys()) & set(smplx_shape.keys()))
+            body_model_param_tensor = {key: torch.tensor(
+                    np.array(body_model_param_smplx[key][idx:idx+1]).reshape(smplx_shape[key]),
+                            device=device, dtype=torch.float32)
+                            for key in intersect_key
+                            if len(body_model_param_smplx[key][idx:idx+1]) > 0}
+            rendered_image = render_pose(img=image,
+                                        body_model_param=body_model_param_tensor,
+                                        body_model=gendered_smplx[gender],
+                                        camera=camera,
+                                        R=R, T=T)
+        # ---------------------- render results ----------------------
+        os.makedirs(args.output_folder, exist_ok=True)
+        # save image
+        out_image_path = os.path.join(args.output_folder,
+                                    f'{os.path.basename(args.hd_path)[:-4]}_{idx}.png')
+        # print(f'Saving image to {out_image_path}')
+        cv2.imwrite(out_image_path, rendered_image)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    # path args
+    parser.add_argument('--hd_path', type=str, required=False,
+                        help='path to humandata npz file',
+                        default='/mnt/d/test_area/hd_sample_SMPLestX/hd_10sample.npz')
+    parser.add_argument('--image_folder', type=str, required=False,
+                        help='path to the image base folder',
+                        default='/mnt/d/test_area/hd_sample_SMPLestX')
+    parser.add_argument('--output_folder', type=str, required=False,
+                        help='path to folder that writes the rendered image',
+                        default='/mnt/d/test_area/hd_sample_SMPLestX/output')
+    # TODO: add default bm path
+    parser.add_argument('--body_model_path', type=str, required=False,
+                        help='path to smpl/smplx models folder, if you follow repo file structure, \
+                            no need to specify',
+                        default='/home/weichen/wc_workspace/models/human_model')
+    # render args
+    parser.add_argument('--flat_hand_mean', type=bool, required=False,
+                        help='use flat hand mean for smplx, will try to load from humandata["misc"] \
+                            if not find, will use value from args',
+                        default=False)
+    parser.add_argument('--render_num', type=int, required=False,
+                        help='Randomly senect how many instances to render',
+                        default='10')
+    args = parser.parse_args()
+    visualize_humandata(args)

SMPLest-X/main/__init__.py ADDED Viewed

File without changes

SMPLest-X/main/base.py ADDED Viewed

	@@ -0,0 +1,234 @@

+import os.path as osp
+import math
+import abc
+from torch.utils.data import DataLoader
+from torch.nn.parallel.data_parallel import DataParallel
+import torch.optim
+import torchvision.transforms as transforms
+from utils.timer import Timer
+from utils.logger import colorlogger
+from datasets.dataset import MultipleDatasets
+import importlib
+from models.SMPLest_X import get_model
+# ddp
+import torch.cuda
+import torch.distributed as dist
+from torch.utils.data import DistributedSampler
+import torch.utils.data.distributed
+from utils.distribute_utils import (
+    get_rank, is_main_process, time_synchronized, get_group_idx, get_process_groups, get_dist_info
+)
+def dynamic_import(module_name, object_name):
+    """Dynamically import a module and access a specific object."""
+    module = importlib.import_module(module_name)
+    return getattr(module, object_name)
+class Base(object):
+    __metaclass__ = abc.ABCMeta
+    def __init__(self, cfg, log_name='logs.txt'):
+        self.cur_epoch = 0
+        # timer
+        self.tot_timer = Timer()
+        self.gpu_timer = Timer()
+        self.read_timer = Timer()
+        # logger
+        self.logger = colorlogger(cfg.log.log_dir, log_name=log_name)
+    @abc.abstractmethod
+    def _make_batch_generator(self):
+        return
+    @abc.abstractmethod
+    def _make_model(self):
+        return
+class Trainer(Base):
+    def __init__(self, cfg, distributed=False, gpu_idx=None):
+        super(Trainer, self).__init__(cfg, log_name='train_logs.txt')
+        self.distributed = distributed
+        self.gpu_idx = gpu_idx
+        self.cfg = cfg
+    def get_optimizer(self, model):
+        normal_param = []
+        for module in model.module.trainable_modules:
+            normal_param += list(module.parameters())
+        optim_params = [
+            {
+                'params': normal_param,
+                'lr': self.cfg.train.lr
+            }
+        ]
+        optimizer = torch.optim.Adam(optim_params, lr=self.cfg.train.lr)
+        return optimizer
+    def save_model(self, state, epoch):
+        file_path = osp.join(self.cfg.log.model_dir, f'snapshot_{str(epoch)}.pth.tar')
+        # do not save smplx layer weights
+        dump_key = []
+        for k in state['network'].keys():
+            if 'smplx_layer' in k:
+                dump_key.append(k)
+        for k in dump_key:
+            state['network'].pop(k, None)
+        torch.save(state, file_path)
+        self.logger.info(f"Write snapshot into {file_path}")
+    def load_model(self, model, optimizer):
+        if self.cfg.model.pretrained_model_path is not None:
+            ckpt_path = self.cfg.model.pretrained_model_path
+            ckpt = torch.load(ckpt_path, map_location=torch.device('cpu'), weights_only=False) # solve CUDA OOM error in DDP
+            model.load_state_dict(ckpt['network'], strict=False)
+            model.cuda()
+            self.logger.info(f'Load checkpoint from {ckpt_path}')
+            torch.cuda.empty_cache()
+            if getattr(self.cfg.train, 'start_over', True):
+                start_epoch = 0
+            else:
+                optimizer.load_state_dict(ckpt['optimizer'])
+                start_epoch = ckpt['epoch'] + 1
+                self.logger.info(f'Load optimizer, start from {start_epoch}')
+        else:
+            start_epoch = 0
+        return start_epoch, model, optimizer
+    def get_lr(self):
+        for g in self.optimizer.param_groups:
+            cur_lr = g['lr']
+        return cur_lr
+    def _make_batch_generator(self):
+        # data load and construct batch generator
+        self.logger_info("Creating dataset...")
+        trainset_humandata_loader = []
+        for humandata_dataset in self.cfg.data.trainset_humandata:
+            trainset_humandata_loader.append(dynamic_import(
+                f"datasets.{humandata_dataset}", humandata_dataset)(transforms.ToTensor(), "train", self.cfg))
+        data_strategy = getattr(self.cfg.data, 'data_strategy', 'balance')
+        if data_strategy == 'concat':
+            print("Using [concat] strategy...")
+            trainset_loader = MultipleDatasets(trainset_humandata_loader,
+                                                make_same_len=False, verbose=True)
+        elif data_strategy == 'balance':
+            total_len = getattr(self.cfg.data, 'total_data_len', 'auto')
+            print(f"Using [balance] strategy with total_data_len : {total_len}...")
+            trainset_loader = MultipleDatasets(trainset_humandata_loader,
+                                                 make_same_len=True, total_len=total_len, verbose=True)
+        self.itr_per_epoch = math.ceil(
+            len(trainset_loader) / self.cfg.train.num_gpus / self.cfg.train.train_batch_size)
+        if self.distributed:
+            self.logger_info(f"Total data length {len(trainset_loader)}.")
+            rank, world_size = get_dist_info()
+            self.logger_info("Using distributed data sampler.")
+            sampler_train = DistributedSampler(trainset_loader, world_size, rank, shuffle=True)
+            self.batch_generator = DataLoader(dataset=trainset_loader, batch_size=self.cfg.train.train_batch_size,
+                                            shuffle=False, num_workers=self.cfg.train.num_thread, sampler=sampler_train,
+                                            pin_memory=True, persistent_workers=True if self.cfg.train.num_thread > 0 else False,
+                                            drop_last=True)
+        else:
+            self.batch_generator = DataLoader(dataset=trainset_loader,
+                                              batch_size=self.cfg.train.num_gpus * self.cfg.train.train_batch_size,
+                                              shuffle=True, num_workers=self.cfg.train.num_thread,
+                                              pin_memory=True, drop_last=True)
+    def _make_model(self):
+        # prepare network
+        self.logger_info("Creating graph and optimizer...")
+        model = get_model(self.cfg, 'train')
+        if self.distributed:
+            self.logger_info("Using distributed data parallel.")
+            model.cuda()
+            model = torch.nn.parallel.DistributedDataParallel(
+                model, device_ids=[self.gpu_idx],
+                find_unused_parameters=True)
+        else:
+            model = DataParallel(model).cuda()
+        optimizer = self.get_optimizer(model)
+        scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,
+                                        self.cfg.train.end_epoch * self.itr_per_epoch,
+                                        eta_min=getattr(self.cfg.train,'min_lr',1e-6))
+        if self.cfg.train.continue_train:
+            start_epoch, model, optimizer = self.load_model(model, optimizer)
+        else:
+            start_epoch = 0
+        model.train()
+        self.scheduler = scheduler
+        self.start_epoch = start_epoch
+        self.model = model
+        self.optimizer = optimizer
+    def logger_info(self, info):
+        if self.distributed:
+            if is_main_process():
+                self.logger.info(info)
+        else:
+            self.logger.info(info)
+class Tester(Base):
+    def __init__(self, cfg):
+        super(Tester, self).__init__(cfg, log_name='test_logs.txt')
+        self.cfg = cfg
+    def _make_batch_generator(self):
+        # data load and construct batch generator
+        self.logger.info("Creating dataset...")
+        testset_loader = dynamic_import(
+                f"datasets.{self.cfg.data.testset}", self.cfg.data.testset)(transforms.ToTensor(), "test", self.cfg)
+        batch_generator = DataLoader(dataset=testset_loader, batch_size=self.cfg.test.test_batch_size,
+                                     shuffle=False, num_workers=1, pin_memory=True)
+        self.testset = testset_loader
+        self.batch_generator = batch_generator
+    def _make_model(self):
+        self.logger.info('Load checkpoint from {}'.format(self.cfg.model.pretrained_model_path))
+        # prepare network
+        self.logger.info("Creating graph...")
+        model = get_model(self.cfg, 'test')
+        model = DataParallel(model).cuda()
+        ckpt = torch.load(self.cfg.model.pretrained_model_path, map_location=torch.device('cpu'), weights_only=False)
+        from collections import OrderedDict
+        new_state_dict = OrderedDict()
+        for k, v in ckpt['network'].items():
+            if 'module' not in k:
+                k = 'module.' + k
+            k = k.replace('backbone', 'encoder').replace('body_rotation_net', 'body_regressor').replace(
+                'hand_rotation_net', 'hand_regressor')
+            new_state_dict[k] = v
+        self.logger.warning("Attention: Strict=False is set for checkpoint loading. Please check manually.")
+        model.load_state_dict(new_state_dict, strict=False)
+        model.cuda()
+        model.eval()
+        self.model = model
+    def _evaluate(self, outs, cur_sample_idx):
+        eval_result = self.testset.evaluate(outs, cur_sample_idx)
+        return eval_result
+    def _print_eval_result(self, eval_result):
+        self.testset.print_eval_result(eval_result)

SMPLest-X/main/config.py ADDED Viewed

	@@ -0,0 +1,101 @@

+import os
+import importlib.util
+from pathlib import Path
+import json
+class Config(dict):
+    """A dictionary that allows dot notation access for configuration settings."""
+    def __init__(self, data=None):
+        super().__init__()
+        if data:
+            for key, value in data.items():
+                # Set the key-value pair using the key and the converted value
+                self[key] = self._convert(value)
+    def _convert(self, value):
+        """Recursively convert nested dictionaries to Config."""
+        if isinstance(value, dict):
+            return Config(value)  # Convert all nested dicts to Config
+        elif isinstance(value, list):
+            return [self._convert(item) for item in value]  # Convert items in lists
+        elif isinstance(value, Path):
+            return str(value)  # Convert Path objects to string
+        return value
+    def __getattr__(self, item):
+        """Allow access to dictionary keys via dot notation."""
+        if item in self:
+            return self[item]
+        raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{item}'")
+    def __setattr__(self, key, value):
+        """Allow setting dictionary keys via dot notation."""
+        self[key] = self._convert(value)
+    @classmethod
+    def load_config(cls, file_path):
+        """Load a Python config file and return it as a Config instance."""
+        spec = importlib.util.spec_from_file_location("config_module", file_path)
+        config_module = importlib.util.module_from_spec(spec)
+        spec.loader.exec_module(config_module)
+        if hasattr(config_module, "config") and isinstance(config_module.config, dict):
+            return cls(config_module.config)  # Ensure full conversion of nested dicts
+        else:
+            raise ValueError("The config file does not define a 'config' dictionary.")
+    def update_config(self, new_data):
+        """Recursively update Config with new dictionary values."""
+        for key, value in new_data.items():
+            if isinstance(value, dict) and isinstance(self.get(key), Config):
+                self[key].update_config(value)  # Recursive update for nested dicts
+            else:
+                self[key] = self._convert(value)  # Convert and assign
+    def dump_config(self, file_path=None):
+        """Dump the Config object into a .py file with a Pythonic and readable format."""
+        # Ensure the provided path is valid
+        if file_path is None:
+            file_path = self.log.output_dir + '/config.py'
+        else:
+            dir_name = os.path.dirname(file_path)
+            if dir_name and not os.path.exists(dir_name):
+                os.makedirs(dir_name)
+        # Convert the Config instance into a regular dictionary
+        def config_to_dict(config):
+            """Recursively convert a Config instance into a regular dictionary."""
+            if isinstance(config, Config):
+                return {key: config_to_dict(value) if isinstance(value, Config) else value
+                        for key, value in config.items()}
+            return config
+        config_dict = config_to_dict(self)
+        # Write the config dictionary to a .py file in a formatted, readable way
+        with open(file_path, 'w') as f:
+            # Use json.dumps to pretty-print the dictionary with indentation and spaces
+            f.write("config = ")
+            f.write(json.dumps(config_dict, indent=4))  # Pretty print with indentation
+            f.write("\n")
+        print(f"Config has been saved to {file_path}")
+    def prepare_log(self):
+        def make_folder(folder):
+            if not os.path.exists(folder):
+                os.makedirs(folder)
+        if self.log.output_dir is not None:
+            make_folder(self.log.output_dir)
+        if self.log.model_dir is not None:
+            make_folder(self.log.model_dir)
+        if self.log.log_dir is not None:
+            make_folder(self.log.log_dir)
+        if self.log.result_dir is not None:
+            make_folder(self.log.result_dir)

SMPLest-X/main/constants.py ADDED Viewed

	@@ -0,0 +1,37 @@

+import numpy as np
+# keypoints3d_cam with root-align has higher priority, followed by old version key keypoints3d
+# when there is keypoints3d_smplx, use this rather than keypoints3d_original
+KPS2D_KEYS = ['keypoints2d', 'keypoints2d_smplx', 'keypoints2d_smpl', 'keypoints2d_original']
+KPS3D_KEYS = ['keypoints3d_cam', 'keypoints3d', 'keypoints3d_smplx','keypoints3d_smpl' ,'keypoints3d_original']
+HANDS_MEAN_R = np.array([ 0.11167871, -0.04289218,  0.41644183,  0.10881133,  0.06598568,
+        0.75622   , -0.09639297,  0.09091566,  0.18845929, -0.11809504,
+       -0.05094385,  0.5295845 , -0.14369841, -0.0552417 ,  0.7048571 ,
+       -0.01918292,  0.09233685,  0.3379135 , -0.45703298,  0.19628395,
+        0.6254575 , -0.21465237,  0.06599829,  0.50689423, -0.36972436,
+        0.06034463,  0.07949023, -0.1418697 ,  0.08585263,  0.63552827,
+       -0.3033416 ,  0.05788098,  0.6313892 , -0.17612089,  0.13209307,
+        0.37335458,  0.8509643 , -0.27692273,  0.09154807, -0.49983943,
+       -0.02655647, -0.05288088,  0.5355592 , -0.04596104,  0.27735803]).reshape(15, -1)
+HANDS_MEAN_L = np.array([ 0.11167871,  0.04289218, -0.41644183,  0.10881133, -0.06598568,
+       -0.75622   , -0.09639297, -0.09091566, -0.18845929, -0.11809504,
+        0.05094385, -0.5295845 , -0.14369841,  0.0552417 , -0.7048571 ,
+       -0.01918292, -0.09233685, -0.3379135 , -0.45703298, -0.19628395,
+       -0.6254575 , -0.21465237, -0.06599829, -0.50689423, -0.36972436,
+       -0.06034463, -0.07949023, -0.1418697 , -0.08585263, -0.63552827,
+       -0.3033416 , -0.05788098, -0.6313892 , -0.17612089, -0.13209307,
+       -0.37335458,  0.8509643 ,  0.27692273, -0.09154807, -0.49983943,
+        0.02655647,  0.05288088,  0.5355592 ,  0.04596104, -0.27735803]).reshape(15, -1)
+# same mapping for 144->137 and 190->137
+SMPLX_137_MAPPING = [
+    0, 1, 2, 4, 5, 7, 8, 12, 16, 17, 18, 19, 20, 21, 60, 61, 62, 63, 64, 65, 59, 58, 57, 56, 55, 37, 38, 39, 66,
+    25, 26, 27, 67, 28, 29, 30, 68, 34, 35, 36, 69, 31, 32, 33, 70, 52, 53, 54, 71, 40, 41, 42, 72, 43, 44, 45,
+    73, 49, 50, 51, 74, 46, 47, 48, 75, 22, 15, 56, 57, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+    90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113,
+    114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135,
+    136, 137, 138, 139, 140, 141, 142, 143]
+# smplx to lsp body keypoints mapping
+LSP_MAPPIMG = [1, 2, 4, 5, 7, 8, 12, 15, 16, 17, 18, 19, 20, 21]

SMPLest-X/main/inference.py ADDED Viewed

	@@ -0,0 +1,188 @@

+import os
+import sys
+PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+sys.path.insert(0, PROJECT_ROOT)
+import os.path as osp
+import argparse
+import numpy as np
+import torchvision.transforms as transforms
+import torch.backends.cudnn as cudnn
+import torch
+import cv2
+import datetime
+from tqdm import tqdm
+from pathlib import Path
+from human_models.human_models import SMPLX
+from ultralytics import YOLO
+from main.base import Tester
+from main.config import Config
+from utils.data_utils import load_img, process_bbox, generate_patch_image
+# from utils.visualization_utils import render_mesh
+from utils.inference_utils import non_max_suppression
+import pickle
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--num_gpus', type=int, dest='num_gpus')
+    parser.add_argument('--file_name', type=str, default='test')
+    parser.add_argument('--ckpt_name', type=str, default='smplest_x_h')
+    parser.add_argument('--ckpt_path', type=str, default='model_dump')
+    parser.add_argument('--start', type=str, default=1)
+    parser.add_argument('--end', type=str, default=1)
+    parser.add_argument('--multi_person', action='store_true')
+    args = parser.parse_args()
+    return args
+def main():
+    args = parse_args()
+    cudnn.benchmark = True
+    # init config
+    time_str = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
+    root_dir = Path(__file__).resolve().parent.parent
+    config_path = osp.join(args.ckpt_path, 'config_base.py')
+    cfg = Config.load_config(config_path)
+    img_folder = osp.join(root_dir, 'demo', 'input_frames', args.file_name)
+    # output_folder = osp.join(root_dir, 'demo', 'output_pkls', args.file_name)
+    output_pkls_folder = osp.join(root_dir, 'demo', 'output_pkls')
+    # os.makedirs(output_folder, exist_ok=True)
+    exp_name = f'inference_{args.file_name}_{args.ckpt_name}_{time_str}'
+    new_config = {
+        "log":{
+            'exp_name':  exp_name,
+            'log_dir': osp.join(root_dir, 'outputs', exp_name, 'log'),
+            }
+    }
+    cfg.update_config(new_config)
+    cfg.prepare_log()
+    # init human models
+    smpl_x = SMPLX(cfg.model.human_model_path)
+    # init tester
+    demoer = Tester(cfg)
+    demoer.logger.info(f"Using 1 GPU.")
+    demoer.logger.info(f'Inference [{args.file_name}] with [{cfg.model.pretrained_model_path}].')
+    demoer._make_model()
+    # init detector
+    bbox_model = getattr(cfg.inference.detection, "model_path",
+                        '/mnt/shared-storage-user/mllm/zangyuhang/pmx/pretrained_weight/yolo/yolo26l.pt')
+    detector = YOLO(bbox_model)
+    start = int(args.start)
+    end = int(args.end) + 1
+    results = []
+    for frame in tqdm(range(start, end)):
+        # prepare input image
+        img_path =osp.join(img_folder, f'{int(frame):06d}.jpg')
+        transform = transforms.ToTensor()
+        original_img = load_img(img_path)
+        vis_img = original_img.copy()
+        original_img_height, original_img_width = original_img.shape[:2]
+        # detection, xyxy
+        yolo_bbox = detector.predict(original_img,
+                                device='cuda',
+                                classes=00,
+                                conf=cfg.inference.detection.conf,
+                                save=cfg.inference.detection.save,
+                                verbose=cfg.inference.detection.verbose
+                                    )[0].boxes.xyxy.detach().cpu().numpy()
+        if len(yolo_bbox)<1:
+            # save original image if no bbox
+            num_bbox = 0
+        elif not args.multi_person:
+            # only select the largest bbox
+            num_bbox = 1
+            # yolo_bbox = yolo_bbox[0]
+        else:
+            # keep bbox by NMS with iou_thr
+            yolo_bbox = non_max_suppression(yolo_bbox, cfg.inference.detection.iou_thr)
+            num_bbox = len(yolo_bbox)
+        # loop all detected bboxes
+        for bbox_id in range(num_bbox):
+            yolo_bbox_xywh = np.zeros((4))
+            yolo_bbox_xywh[0] = yolo_bbox[bbox_id][0]
+            yolo_bbox_xywh[1] = yolo_bbox[bbox_id][1]
+            yolo_bbox_xywh[2] = abs(yolo_bbox[bbox_id][2] - yolo_bbox[bbox_id][0])
+            yolo_bbox_xywh[3] = abs(yolo_bbox[bbox_id][3] - yolo_bbox[bbox_id][1])
+            # xywh
+            bbox = process_bbox(bbox=yolo_bbox_xywh,
+                                img_width=original_img_width,
+                                img_height=original_img_height,
+                                input_img_shape=cfg.model.input_img_shape,
+                                ratio=getattr(cfg.data, "bbox_ratio", 1.25))
+            img, _, _ = generate_patch_image(cvimg=original_img,
+                                                bbox=bbox,
+                                                scale=1.0,
+                                                rot=0.0,
+                                                do_flip=False,
+                                                out_shape=cfg.model.input_img_shape)
+            img = transform(img.astype(np.float32))/255
+            img = img.cuda()[None,:,:,:]
+            inputs = {'img': img}
+            targets = {}
+            meta_info = {}
+            # mesh recovery
+            with torch.no_grad():
+                out = demoer.model(inputs, targets, meta_info, 'test')
+            mesh = out['smplx_mesh_cam'].detach().cpu().numpy()[0]
+            result = {
+                "frame_id": frame,
+                "bbox_xyxy": yolo_bbox[bbox_id].copy(),
+                "smplx_mesh_cam": out['smplx_mesh_cam'].detach().cpu().numpy()[0],
+            }
+            # 可选（如果存在）
+            for k in [
+                'smplx_joint_cam',
+                'smplx_pose',
+                'smplx_shape',
+                'smplx_expr'
+            ]:
+                if k in out:
+                    result[k] = out[k].detach().cpu().numpy()[0]
+            results.append(result)
+            # render mesh
+            # focal = [cfg.model.focal[0] / cfg.model.input_body_shape[1] * bbox[2],
+            #          cfg.model.focal[1] / cfg.model.input_body_shape[0] * bbox[3]]
+            # princpt = [cfg.model.princpt[0] / cfg.model.input_body_shape[1] * bbox[2] + bbox[0],
+            #            cfg.model.princpt[1] / cfg.model.input_body_shape[0] * bbox[3] + bbox[1]]
+            # draw the bbox on img
+            # vis_img = cv2.rectangle(vis_img, (int(yolo_bbox[bbox_id][0]), int(yolo_bbox[bbox_id][1])),
+                                    # (int(yolo_bbox[bbox_id][2]), int(yolo_bbox[bbox_id][3])), (0, 255, 0), 1)
+            # draw mesh
+            # vis_img = None #render_mesh(vis_img, mesh, smpl_x.face, {'focal': focal, 'princpt': princpt}, mesh_as_vertices=False)
+        # save rendered image
+        # frame_name = os.path.basename(img_path)
+        # cv2.imwrite(os.path.join(output_folder, frame_name), vis_img[:, :, ::-1])
+    # save as pkl
+    pkl_path = os.path.join(
+        output_pkls_folder,
+        f'{args.file_name}.pkl'
+    )
+    with open(pkl_path, 'wb') as f:
+        pickle.dump(results, f)
+    print(f"✅ Saved results to {pkl_path}")
+if __name__ == "__main__":
+    main()

SMPLest-X/main/test.py ADDED Viewed

	@@ -0,0 +1,107 @@

+import argparse
+import torch
+import torch.backends.cudnn as cudnn
+from main.config import Config
+import os.path as osp
+import datetime
+from pathlib import Path
+from main.base import Tester
+from human_models.human_models import SMPL, SMPLX
+from tqdm import tqdm
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--num_gpus', type=int, dest='num_gpus')
+    parser.add_argument('--exp_name', type=str, default='output/test')
+    parser.add_argument('--result_path', type=str, default='output/test')
+    parser.add_argument('--ckpt_idx', type=int, default=0)
+    parser.add_argument('--test_batch_size', type=int, default=64)
+    parser.add_argument('--testset', type=str, default='EHF')
+    parser.add_argument('--use_cache', action='store_true')
+    args = parser.parse_args()
+    return args
+def main():
+    args = parse_args()
+    cudnn.benchmark = True
+    # init config
+    time_str = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
+    root_dir = Path(__file__).resolve().parent.parent
+    config_path = osp.join('./outputs',args.result_path, 'code', 'config_base.py')
+    cfg = Config.load_config(config_path)
+    checkpoint_path = osp.join('./outputs', args.result_path, 'model_dump', f'snapshot_{int(args.ckpt_idx)}.pth.tar')
+    exp_name = f'{args.exp_name}_ep{int(args.ckpt_idx)}_{time_str}'
+    if args.testset in ['AGORA_test', 'BEDLAM_test']:
+        print(f'Test on {args.testset} set...')
+    new_config = {
+        "data": {
+            "testset": str(args.testset),
+            "use_cache": args.use_cache,
+        },
+        "test":{
+            "test_batch_size": int(args.test_batch_size),
+        },
+        "model": {
+            "pretrained_model_path": checkpoint_path,
+        },
+        "log":{
+            'exp_name':  exp_name,
+            'output_dir': osp.join(root_dir, 'outputs', exp_name),
+            'model_dir': osp.join(root_dir, 'outputs', exp_name, 'model_dump'),
+            'log_dir': osp.join(root_dir, 'outputs', exp_name, 'log'),
+            'result_dir': osp.join(root_dir, 'outputs', exp_name, 'result'),
+        }
+    }
+    cfg.update_config(new_config)
+    cfg.prepare_log()
+    cfg.dump_config()
+    # init human models
+    smpl = SMPL(cfg.model.human_model_path)
+    smpl_x = SMPLX(cfg.model.human_model_path)
+    # init tester
+    tester = Tester(cfg)
+    tester.logger.info(f"Using 1 GPU with bs={cfg.test.test_batch_size} per GPU.")
+    tester.logger.info(f'Testing [{checkpoint_path}] on datasets [{cfg.data.testset}]')
+    tester._make_batch_generator()
+    tester._make_model()
+    eval_result = {}
+    cur_sample_idx = 0
+    for itr, (inputs, targets, meta_info) in enumerate(tqdm(tester.batch_generator)):
+        with torch.no_grad():
+            model_out = tester.model(inputs, targets, meta_info, 'test')
+        batch_size = model_out['img'].shape[0]
+        out = {}
+        for k, v in model_out.items():
+            if isinstance(v, torch.Tensor):
+                out[k] = v.cpu().numpy()
+            elif isinstance(v, list):
+                out[k] = v
+            else:
+                raise ValueError('Undefined type in out. Key: {}; Type: {}.'.format(k, type(v)))
+        out = [{k: v[bid] for k, v in out.items()} for bid in range(batch_size)]
+        # evaluate
+        cur_eval_result = tester._evaluate(out, cur_sample_idx)
+        for k, v in cur_eval_result.items():
+            if k in eval_result:
+                eval_result[k] += v
+            else:
+                eval_result[k] = v
+        cur_sample_idx += len(out)
+    tester._print_eval_result(eval_result)
+if __name__ == "__main__":
+    main()

SMPLest-X/main/train.py ADDED Viewed

	@@ -0,0 +1,138 @@

+import argparse
+import torch.backends.cudnn as cudnn
+from main.config import Config
+import os.path as osp
+import os
+import datetime
+from pathlib import Path
+import torch.distributed as dist
+from utils.distribute_utils import init_distributed_mode, \
+    is_main_process, set_seed, get_dist_info
+from main.base import Trainer
+from human_models.human_models import SMPL, SMPLX
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--local_rank', type=int, dest='num_gpus')
+    parser.add_argument('--num_gpus', type=int, dest='num_gpus')
+    parser.add_argument('--master_port', type=int, dest='master_port')
+    parser.add_argument('--exp_name', type=str, default='output/test')
+    parser.add_argument('--config', type=str, default='./config/config_base.py')
+    args = parser.parse_args()
+    return args
+def main():
+    args = parse_args()
+    set_seed(2023)
+    cudnn.benchmark = True
+    # process config
+    time_str = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
+    root_dir = Path(__file__).resolve().parent.parent
+    config_path = osp.join('./configs', args.config) # TODO: move config folder outsied main
+    cfg = Config.load_config(config_path)
+    new_config = {
+        "train": {
+            "num_gpus": int(args.num_gpus),
+        },
+        "log":{
+            'exp_name':  f'{args.exp_name}_{time_str}',
+            'output_dir': osp.join(root_dir, 'outputs', f'{args.exp_name}_{time_str}'),
+            'model_dir': osp.join(root_dir, 'outputs', f'{args.exp_name}_{time_str}', 'model_dump'),
+            'log_dir': osp.join(root_dir, 'outputs', f'{args.exp_name}_{time_str}', 'log'),
+            'result_dir': osp.join(root_dir, 'outputs', f'{args.exp_name}_{time_str}', 'result'),
+        }
+    }
+    cfg.update_config(new_config)
+    cfg.prepare_log()
+    cfg.dump_config()
+    # init ddp
+    distributed, gpu_idx = init_distributed_mode(args.master_port)
+    # init human models
+    smpl = SMPL(cfg.model.human_model_path)
+    smpl_x = SMPLX(cfg.model.human_model_path)
+    # init traininer
+    trainer = Trainer(cfg, distributed, gpu_idx)
+    trainer.logger_info(f"Using {cfg.train.num_gpus} GPUs with bs={cfg.train.train_batch_size} per GPU.")
+    trainer.logger_info(f'Training with datasets: {cfg.data.trainset_humandata}')
+    trainer._make_batch_generator()
+    trainer._make_model()
+    for epoch in range(trainer.start_epoch, cfg.train.end_epoch):
+        trainer.tot_timer.tic()
+        trainer.read_timer.tic()
+        # ddp, align random seed between devices
+        trainer.batch_generator.sampler.set_epoch(epoch)
+        for itr, (inputs, targets, meta_info) in enumerate(trainer.batch_generator):
+            trainer.read_timer.toc()
+            trainer.gpu_timer.tic()
+            # forward
+            trainer.optimizer.zero_grad()
+            loss= trainer.model(inputs, targets, meta_info, 'train')
+            loss_mean = {k: v.mean() for k, v in loss.items()}
+            loss_sum = sum(v for k, v in loss_mean.items())
+            # backward
+            loss_sum.backward()
+            trainer.optimizer.step()
+            trainer.scheduler.step()
+            trainer.gpu_timer.toc()
+            if (itr + 1) % cfg.train.print_iters == 0:
+                # loss of all ranks
+                rank, world_size = get_dist_info()
+                loss_print = loss_mean.copy()
+                for k in loss_print:
+                    dist.all_reduce(loss_print[k])
+                total_loss = 0
+                for k in loss_print:
+                    loss_print[k] = loss_print[k] / world_size
+                    total_loss += loss_print[k]
+                loss_print['total'] = total_loss
+                screen = [
+                    'Epoch %d/%d itr %d/%d:' % (epoch, cfg.train.end_epoch, itr, trainer.itr_per_epoch),
+                    'lr: %g' % (trainer.get_lr()),
+                    'speed: %.2f(%.2fs r%.2f)s/itr' % (
+                        trainer.tot_timer.average_time, trainer.gpu_timer.average_time,
+                        trainer.read_timer.average_time),
+                    '%.2fh/epoch' % (trainer.tot_timer.average_time / 3600. * trainer.itr_per_epoch),
+                ]
+                screen += ['%s: %.4f' % ('loss_' + k, v.detach()) for k, v in loss_print.items()]
+                trainer.logger_info(' '.join(screen))
+            trainer.tot_timer.toc()
+            trainer.tot_timer.tic()
+            trainer.read_timer.tic()
+        # save model ddp, save model.module on rank 0 only
+        save_epoch = getattr(cfg.train, 'save_epoch', 5)
+        previous_saved_epoch = None
+        remove_previous = getattr(cfg.train, 'remove_checkpoint', False)
+        if is_main_process() and (epoch % save_epoch == 0 or epoch == cfg.train.end_epoch - 1):
+            trainer.save_model({
+                'epoch': epoch,
+                'network': trainer.model.state_dict(),
+                'optimizer': trainer.optimizer.state_dict(),
+            }, epoch)
+            # remove previous
+            if previous_saved_epoch is not None and remove_previous:
+                to_remove = osp.join(cfg.log.model_dir, f'snapshot_{str(previous_saved_epoch)}.pth.tar')
+                os.remove(to_remove)
+                previous_saved_epoch = epoch
+        dist.barrier()
+if __name__ == "__main__":
+    main()

SMPLest-X/requirements.txt ADDED Viewed

	@@ -0,0 +1,13 @@

+numpy==1.23.1
+smplx==0.1.28
+tqdm==4.67.1
+opencv-python==4.11.0.86
+chumpy==0.70
+trimesh==4.6.2
+pyrender==0.1.45
+matplotlib==3.7.5
+json_tricks==3.17.3
+einops==0.8.1
+timm==1.0.14
+ultralytics==8.3.75
+pyopengl

SMPLest-X/requirements_py310.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+numpy>=1.23.1,<2.0
+smplx>=0.1.28
+tqdm>=4.67.1
+opencv-python
+chumpy>=0.70
+trimesh>=4.6.2
+pyrender>=0.1.45
+matplotlib>=3.7.5
+json_tricks>=3.17.3
+einops>=0.8.1
+timm>=1.0.14
+ultralytics>=8.3.75
+scipy
+pandas

SMPLest-X/utils/distribute_utils.py ADDED Viewed

	@@ -0,0 +1,171 @@

+import os
+import os.path as osp
+import pickle
+import shutil
+import tempfile
+import time
+import torch
+import torch.distributed as dist
+import random
+import numpy as np
+def get_dist_info():
+    """
+    Get the rank and world size in the current distributed training setup.
+    Returns:
+        tuple: (rank, world_size)
+            rank: int, the rank of the current process.
+            world_size: int, the total number of processes in the group.
+    """
+    if dist.is_available() and dist.is_initialized():
+        rank = dist.get_rank()
+        world_size = dist.get_world_size()
+    else:
+        rank = 0
+        world_size = 1
+    return rank, world_size
+def set_seed(seed):
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+def time_synchronized():
+    torch.cuda.synchronize() if torch.cuda.is_available() else None
+    return time.time()
+def setup_for_distributed(is_master):
+    """This function disables printing when not in master process."""
+    import builtins as __builtin__
+    builtin_print = __builtin__.print
+    def print(*args, **kwargs):
+        force = kwargs.pop('force', False)
+        if is_master or force:
+            builtin_print(*args, **kwargs)
+    __builtin__.print = print
+def init_distributed_mode(port = None, master_port=29500):
+    """Initialize slurm distributed training environment.
+    If argument ``port`` is not specified, then the master port will be system
+    environment variable ``MASTER_PORT``. If ``MASTER_PORT`` is not in system
+    environment variable, then a default port ``29500`` will be used.
+    Args:
+        backend (str): Backend of torch.distributed.
+        port (int, optional): Master port. Defaults to None.
+    """
+    # import pdb; pdb.set_trace()
+    dist_backend = 'nccl'
+    rank = int(os.environ['RANK'])
+    num_gpus = torch.cuda.device_count()
+    torch.cuda.set_device(rank % num_gpus)
+    dist.init_process_group(backend=dist_backend)
+    distributed = True
+    gpu_idx = rank % num_gpus
+    return distributed, gpu_idx
+def is_dist_avail_and_initialized():
+    if not dist.is_available():
+        return False
+    if not dist.is_initialized():
+        return False
+    return True
+def get_world_size():
+    if not is_dist_avail_and_initialized():
+        return 1
+    return dist.get_world_size()
+def get_rank():
+    if not is_dist_avail_and_initialized():
+        return 0
+    return dist.get_rank()
+def get_process_groups():
+    world_size = int(os.environ['WORLD_SIZE'])
+    ranks = list(range(world_size))
+    num_gpus = torch.cuda.device_count()
+    num_nodes = world_size // num_gpus
+    if world_size % num_gpus != 0:
+        raise NotImplementedError('Not implemented for node not fully used.')
+    groups = []
+    for node_idx in range(num_nodes):
+        groups.append(ranks[node_idx*num_gpus : (node_idx+1)*num_gpus])
+    process_groups = [torch.distributed.new_group(group) for group in groups]
+    return process_groups
+def get_group_idx():
+    num_gpus = torch.cuda.device_count()
+    proc_id = get_rank()
+    group_idx = proc_id // num_gpus
+    return group_idx
+def is_main_process():
+    return get_rank() == 0
+def cleanup():
+    dist.destroy_process_group()
+def all_gather(data):
+    """
+    Run all_gather on arbitrary picklable data (not necessarily tensors)
+    Args:
+        data:
+            Any picklable object
+    Returns:
+        data_list(list):
+            List of data gathered from each rank
+    """
+    world_size = get_world_size()
+    if world_size == 1:
+        return [data]
+    # serialized to a Tensor
+    buffer = pickle.dumps(data)
+    storage = torch.ByteStorage.from_buffer(buffer)
+    tensor = torch.ByteTensor(storage).to('cuda')
+    # obtain Tensor size of each rank
+    local_size = torch.tensor([tensor.numel()], device='cuda')
+    size_list = [torch.tensor([0], device='cuda') for _ in range(world_size)]
+    dist.all_gather(size_list, local_size)
+    size_list = [int(size.item()) for size in size_list]
+    max_size = max(size_list)
+    # receiving Tensor from all ranks
+    # we pad the tensor because torch all_gather does not support
+    # gathering tensors of different shapes
+    tensor_list = []
+    for _ in size_list:
+        tensor_list.append(
+            torch.empty((max_size, ), dtype=torch.uint8, device='cuda'))
+    if local_size != max_size:
+        padding = torch.empty(
+            size=(max_size - local_size, ), dtype=torch.uint8, device='cuda')
+        tensor = torch.cat((tensor, padding), dim=0)
+    dist.all_gather(tensor_list, tensor)
+    data_list = []
+    for size, tensor in zip(size_list, tensor_list):
+        buffer = tensor.cpu().numpy().tobytes()[:size]
+        data_list.append(pickle.loads(buffer))
+    return data_list

SMPLest-X/utils/timer.py ADDED Viewed

	@@ -0,0 +1,31 @@

+import time
+class Timer(object):
+    """A simple timer."""
+    def __init__(self):
+        self.total_time = 0.
+        self.calls = 0
+        self.start_time = 0.
+        self.diff = 0.
+        self.average_time = 0.
+        self.warm_up = 0
+    def tic(self):
+        # using time.time instead of time.clock because time time.clock
+        # does not normalize for multithreading
+        self.start_time = time.time()
+    def toc(self, average=True):
+        self.diff = time.time() - self.start_time
+        if self.warm_up < 10:
+            self.warm_up += 1
+            return self.diff
+        else:
+            self.total_time += self.diff
+            self.calls += 1
+            self.average_time = self.total_time / self.calls
+        if average:
+            return self.average_time
+        else:
+            return self.diff

SMPLest-X/utils/transforms.py ADDED Viewed

	@@ -0,0 +1,366 @@

+import torch
+import numpy as np
+from torch.nn import functional as F
+from einops.einops import rearrange
+def cam2pixel(cam_coord, f, c):
+    x = cam_coord[:, 0] / cam_coord[:, 2] * f[0] + c[0]
+    y = cam_coord[:, 1] / cam_coord[:, 2] * f[1] + c[1]
+    z = cam_coord[:, 2]
+    return np.stack((x, y, z), 1)
+def pixel2cam(pixel_coord, f, c):
+    x = (pixel_coord[:, 0] - c[0]) / f[0] * pixel_coord[:, 2]
+    y = (pixel_coord[:, 1] - c[1]) / f[1] * pixel_coord[:, 2]
+    z = pixel_coord[:, 2]
+    return np.stack((x, y, z), 1)
+def world2cam(world_coord, R, t):
+    cam_coord = np.dot(R, world_coord.transpose(1, 0)).transpose(1, 0) + t.reshape(1, 3)
+    return cam_coord
+def cam2world(cam_coord, R, t):
+    world_coord = np.dot(np.linalg.inv(R), (cam_coord - t.reshape(1, 3)).transpose(1, 0)).transpose(1, 0)
+    return world_coord
+def rigid_transform_3D(A, B):
+    n, dim = A.shape
+    centroid_A = np.mean(A, axis=0)
+    centroid_B = np.mean(B, axis=0)
+    H = np.dot(np.transpose(A - centroid_A), B - centroid_B) / n
+    U, s, V = np.linalg.svd(H)
+    R = np.dot(np.transpose(V), np.transpose(U))
+    if np.linalg.det(R) < 0:
+        s[-1] = -s[-1]
+        V[2] = -V[2]
+        R = np.dot(np.transpose(V), np.transpose(U))
+    varP = np.var(A, axis=0).sum()
+    c = 1 / varP * np.sum(s)
+    t = -np.dot(c * R, np.transpose(centroid_A)) + np.transpose(centroid_B)
+    return c, R, t
+def rigid_align(A, B):
+    c, R, t = rigid_transform_3D(A, B)
+    A2 = np.transpose(np.dot(c * R, np.transpose(A))) + t
+    return A2
+def transform_joint_to_other_db(src_joint, src_name, dst_name):
+    src_joint_num = len(src_name)
+    dst_joint_num = len(dst_name)
+    new_joint = np.zeros(((dst_joint_num,) + src_joint.shape[1:]), dtype=np.float32)
+    for src_idx in range(len(src_name)):
+        name = src_name[src_idx]
+        if name in dst_name:
+            dst_idx = dst_name.index(name)
+            new_joint[dst_idx] = src_joint[src_idx]
+    return new_joint
+def rotation_matrix_to_angle_axis(rotation_matrix):
+    """Convert 3x4 rotation matrix to Rodrigues vector
+    Args:
+        rotation_matrix (Tensor): rotation matrix.
+    Returns:
+        Tensor: Rodrigues vector transformation.
+    Shape:
+        - Input: :math:`(N, 3, 4)`
+        - Output: :math:`(N, 3)`
+    Example:
+        >>> input = torch.rand(2, 3, 4)  # Nx4x4
+        >>> output = tgm.rotation_matrix_to_angle_axis(input)  # Nx3
+    """
+    # todo add check that matrix is a valid rotation matrix
+    quaternion = rotation_matrix_to_quaternion(rotation_matrix)
+    return quaternion_to_angle_axis(quaternion)
+def quaternion_to_angle_axis(quaternion: torch.Tensor) -> torch.Tensor:
+    """Convert quaternion vector to angle axis of rotation.
+    Adapted from ceres C++ library: ceres-solver/include/ceres/rotation.h
+    Args:
+        quaternion (torch.Tensor): tensor with quaternions.
+    Return:
+        torch.Tensor: tensor with angle axis of rotation.
+    Shape:
+        - Input: :math:`(*, 4)` where `*` means, any number of dimensions
+        - Output: :math:`(*, 3)`
+    Example:
+        >>> quaternion = torch.rand(2, 4)  # Nx4
+        >>> angle_axis = tgm.quaternion_to_angle_axis(quaternion)  # Nx3
+    """
+    if not torch.is_tensor(quaternion):
+        raise TypeError("Input type is not a torch.Tensor. Got {}".format(
+            type(quaternion)))
+    if not quaternion.shape[-1] == 4:
+        raise ValueError("Input must be a tensor of shape Nx4 or 4. Got {}"
+                         .format(quaternion.shape))
+    # unpack input and compute conversion
+    q1: torch.Tensor = quaternion[..., 1]
+    q2: torch.Tensor = quaternion[..., 2]
+    q3: torch.Tensor = quaternion[..., 3]
+    sin_squared_theta: torch.Tensor = q1 * q1 + q2 * q2 + q3 * q3
+    sin_theta: torch.Tensor = torch.sqrt(sin_squared_theta)
+    cos_theta: torch.Tensor = quaternion[..., 0]
+    two_theta: torch.Tensor = 2.0 * torch.where(
+        cos_theta < 0.0,
+        torch.atan2(-sin_theta, -cos_theta),
+        torch.atan2(sin_theta, cos_theta))
+    k_pos: torch.Tensor = two_theta / sin_theta
+    k_neg: torch.Tensor = 2.0 * torch.ones_like(sin_theta)
+    k: torch.Tensor = torch.where(sin_squared_theta > 0.0, k_pos, k_neg)
+    angle_axis: torch.Tensor = torch.zeros_like(quaternion)[..., :3]
+    angle_axis[..., 0] += q1 * k
+    angle_axis[..., 1] += q2 * k
+    angle_axis[..., 2] += q3 * k
+    return angle_axis
+def rotation_matrix_to_quaternion(rotation_matrix, eps=1e-6):
+    """Convert 3x4 rotation matrix to 4d quaternion vector
+    This algorithm is based on algorithm described in
+    https://github.com/KieranWynn/pyquaternion/blob/master/pyquaternion/quaternion.py#L201
+    Args:
+        rotation_matrix (Tensor): the rotation matrix to convert.
+    Return:
+        Tensor: the rotation in quaternion
+    Shape:
+        - Input: :math:`(N, 3, 4)`
+        - Output: :math:`(N, 4)`
+    Example:
+        >>> input = torch.rand(4, 3, 4)  # Nx3x4
+        >>> output = tgm.rotation_matrix_to_quaternion(input)  # Nx4
+    """
+    if not torch.is_tensor(rotation_matrix):
+        raise TypeError("Input type is not a torch.Tensor. Got {}".format(
+            type(rotation_matrix)))
+    input_shape = rotation_matrix.shape
+    if len(input_shape) == 2:
+        rotation_matrix = rotation_matrix.unsqueeze(0)
+    if len(rotation_matrix.shape) > 3:
+        raise ValueError(
+            "Input size must be a three dimensional tensor. Got {}".format(
+                rotation_matrix.shape))
+    if not rotation_matrix.shape[-2:] == (3, 4):
+        raise ValueError(
+            "Input size must be a N x 3 x 4  tensor. Got {}".format(
+                rotation_matrix.shape))
+    rmat_t = torch.transpose(rotation_matrix, 1, 2)
+    mask_d2 = rmat_t[:, 2, 2] < eps
+    mask_d0_d1 = rmat_t[:, 0, 0] > rmat_t[:, 1, 1]
+    mask_d0_nd1 = rmat_t[:, 0, 0] < -rmat_t[:, 1, 1]
+    t0 = 1 + rmat_t[:, 0, 0] - rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q0 = torch.stack([rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+                      t0, rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+                      rmat_t[:, 2, 0] + rmat_t[:, 0, 2]], -1)
+    t0_rep = t0.repeat(4, 1).t()
+    t1 = 1 - rmat_t[:, 0, 0] + rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q1 = torch.stack([rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+                      rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+                      t1, rmat_t[:, 1, 2] + rmat_t[:, 2, 1]], -1)
+    t1_rep = t1.repeat(4, 1).t()
+    t2 = 1 - rmat_t[:, 0, 0] - rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q2 = torch.stack([rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
+                      rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
+                      rmat_t[:, 1, 2] + rmat_t[:, 2, 1], t2], -1)
+    t2_rep = t2.repeat(4, 1).t()
+    t3 = 1 + rmat_t[:, 0, 0] + rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q3 = torch.stack([t3, rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+                      rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+                      rmat_t[:, 0, 1] - rmat_t[:, 1, 0]], -1)
+    t3_rep = t3.repeat(4, 1).t()
+    mask_c0 = mask_d2.float() * mask_d0_d1.float()
+    mask_c1 = mask_d2.float() * (1 - mask_d0_d1.float())
+    mask_c2 = (1 - mask_d2.float()) * mask_d0_nd1.float()
+    mask_c3 = (1 - mask_d2.float()) * (1 - mask_d0_nd1.float())
+    mask_c0 = mask_c0.view(-1, 1).type_as(q0)
+    mask_c1 = mask_c1.view(-1, 1).type_as(q1)
+    mask_c2 = mask_c2.view(-1, 1).type_as(q2)
+    mask_c3 = mask_c3.view(-1, 1).type_as(q3)
+    q = q0 * mask_c0 + q1 * mask_c1 + q2 * mask_c2 + q3 * mask_c3
+    q /= torch.sqrt(t0_rep * mask_c0 + t1_rep * mask_c1 +  # noqa
+                    t2_rep * mask_c2 + t3_rep * mask_c3)
+    q *= 0.5
+    if len(input_shape) == 2:
+        q = q.squeeze(0)
+    return q
+def rot6d_to_axis_angle(x):
+    batch_size = x.shape[0]
+    x = x.view(-1, 3, 2)
+    a1 = x[:, :, 0]
+    a2 = x[:, :, 1]
+    b1 = F.normalize(a1)
+    b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+    b3 = torch.cross(b1, b2)
+    rot_mat = torch.stack((b1, b2, b3), dim=-1)  # 3x3 rotation matrix
+    rot_mat = torch.cat([rot_mat, torch.zeros((batch_size, 3, 1)).cuda().float()], 2)  # 3x4 rotation matrix
+    axis_angle = rotation_matrix_to_angle_axis(rot_mat).reshape(-1, 3)  # axis-angle
+    axis_angle[torch.isnan(axis_angle)] = 0.0
+    return axis_angle
+def rot6d_to_rotmat(x):
+    """Convert 6D rotation representation to 3x3 rotation matrix.
+    Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+    Input:
+        (B,6) Batch of 6-D rotation representations
+    Output:
+        (B,3,3) Batch of corresponding rotation matrices
+    """
+    if x.shape[-1] == 6:
+        batch_size = x.shape[0]
+        if len(x.shape) == 3:
+            num = x.shape[1]
+            x = rearrange(x, 'b n d -> (b n) d', d=6)
+        else:
+            num = 1
+        x = rearrange(x, 'b (k l) -> b k l', k=3, l=2)
+        # x = x.view(-1,3,2)
+        a1 = x[:, :, 0]
+        a2 = x[:, :, 1]
+        b1 = F.normalize(a1)
+        b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+        b3 = torch.cross(b1, b2, dim=-1)
+        mat = torch.stack((b1, b2, b3), dim=-1)
+        if num > 1:
+            mat = rearrange(mat, '(b n) h w-> b n h w', b=batch_size, n=num, h=3, w=3)
+    else:
+        x = x.view(-1,3,2)
+        a1 = x[:, :, 0]
+        a2 = x[:, :, 1]
+        b1 = F.normalize(a1)
+        b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+        b3 = torch.cross(b1, b2, dim=-1)
+        mat = torch.stack((b1, b2, b3), dim=-1)
+    return mat
+def batch_rodrigues(theta):
+    """Convert axis-angle representation to rotation matrix.
+    Args:
+        theta: size = [B, 3]
+    Returns:
+        Rotation matrix corresponding to the quaternion -- size = [B, 3, 3]
+    """
+    l1norm = torch.norm(theta + 1e-8, p = 2, dim = 1)
+    angle = torch.unsqueeze(l1norm, -1)
+    normalized = torch.div(theta, angle)
+    angle = angle * 0.5
+    v_cos = torch.cos(angle)
+    v_sin = torch.sin(angle)
+    quat = torch.cat([v_cos, v_sin * normalized], dim = 1)
+    return quat_to_rotmat(quat)
+def quat_to_rotmat(quat):
+    """Convert quaternion coefficients to rotation matrix.
+    Args:
+        quat: size = [B, 4] 4 <===>(w, x, y, z)
+    Returns:
+        Rotation matrix corresponding to the quaternion -- size = [B, 3, 3]
+    """
+    norm_quat = quat
+    norm_quat = norm_quat/norm_quat.norm(p=2, dim=1, keepdim=True)
+    w, x, y, z = norm_quat[:,0], norm_quat[:,1], norm_quat[:,2], norm_quat[:,3]
+    B = quat.size(0)
+    w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2)
+    wx, wy, wz = w*x, w*y, w*z
+    xy, xz, yz = x*y, x*z, y*z
+    rotMat = torch.stack([w2 + x2 - y2 - z2, 2*xy - 2*wz, 2*wy + 2*xz,
+                          2*wz + 2*xy, w2 - x2 + y2 - z2, 2*yz - 2*wx,
+                          2*xz - 2*wy, 2*wx + 2*yz, w2 - x2 - y2 + z2], dim=1).view(B, 3, 3)
+    return rotMat
+def sample_joint_features(img_feat, joint_xy):
+    height, width = img_feat.shape[2:]
+    x = joint_xy[:, :, 0] / (width - 1) * 2 - 1
+    y = joint_xy[:, :, 1] / (height - 1) * 2 - 1
+    grid = torch.stack((x, y), 2)[:, :, None, :]
+    img_feat = F.grid_sample(img_feat, grid, align_corners=True)[:, :, :, 0]  # batch_size, channel_dim, joint_num
+    img_feat = img_feat.permute(0, 2, 1).contiguous()  # batch_size, joint_num, channel_dim
+    return img_feat
+def soft_argmax_2d(heatmap2d):
+    batch_size = heatmap2d.shape[0]
+    height, width = heatmap2d.shape[2:]
+    heatmap2d = heatmap2d.reshape((batch_size, -1, height * width))
+    heatmap2d = F.softmax(heatmap2d, 2)
+    heatmap2d = heatmap2d.reshape((batch_size, -1, height, width))
+    accu_x = heatmap2d.sum(dim=(2))
+    accu_y = heatmap2d.sum(dim=(3))
+    accu_x = accu_x * torch.arange(width).float().cuda()[None, None, :]
+    accu_y = accu_y * torch.arange(height).float().cuda()[None, None, :]
+    accu_x = accu_x.sum(dim=2, keepdim=True)
+    accu_y = accu_y.sum(dim=2, keepdim=True)
+    coord_out = torch.cat((accu_x, accu_y), dim=2)
+    return coord_out
+def soft_argmax_3d(heatmap3d):
+    batch_size = heatmap3d.shape[0]
+    depth, height, width = heatmap3d.shape[2:]
+    heatmap3d = heatmap3d.reshape((batch_size, -1, depth * height * width))
+    heatmap3d = F.softmax(heatmap3d, 2)
+    heatmap3d = heatmap3d.reshape((batch_size, -1, depth, height, width))
+    accu_x = heatmap3d.sum(dim=(2, 3))
+    accu_y = heatmap3d.sum(dim=(2, 4))
+    accu_z = heatmap3d.sum(dim=(3, 4))
+    accu_x = accu_x * torch.arange(width).float().cuda()[None, None, :]
+    accu_y = accu_y * torch.arange(height).float().cuda()[None, None, :]
+    accu_z = accu_z * torch.arange(depth).float().cuda()[None, None, :]
+    accu_x = accu_x.sum(dim=2, keepdim=True)
+    accu_y = accu_y.sum(dim=2, keepdim=True)
+    accu_z = accu_z.sum(dim=2, keepdim=True)
+    coord_out = torch.cat((accu_x, accu_y, accu_z), dim=2)
+    return coord_out

WiLoR/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

WiLoR/README.md ADDED Viewed

	@@ -0,0 +1,93 @@

+<div align="center">
+# WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild
+[Rolandos Alexandros Potamias](https://rolpotamias.github.io)<sup>1</sup> &emsp; [Jinglei Zhang]()<sup>2</sup> &emsp; [Jiankang Deng](https://jiankangdeng.github.io/)<sup>1</sup> &emsp; [Stefanos Zafeiriou](https://www.imperial.ac.uk/people/s.zafeiriou)<sup>1</sup>
+<sup>1</sup>Imperial College London, UK <br>
+<sup>2</sup>Shanghai Jiao Tong University, China
+<font color="blue"><strong>CVPR 2025</strong></font>
+<a href='https://rolpotamias.github.io/WiLoR/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
+<a href='https://arxiv.org/abs/2409.12259'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>
+<a href='https://huggingface.co/spaces/rolpotamias/WiLoR'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-green'></a>
+<a href='https://colab.research.google.com/drive/1bNnYFECmJbbvCNZAKtQcxJGxf0DZppsB?usp=sharing'><img src='https://colab.research.google.com/assets/colab-badge.svg'></a>
+</div>
+<div align="center">
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wilor-end-to-end-3d-hand-localization-and/3d-hand-pose-estimation-on-freihand)](https://paperswithcode.com/sota/3d-hand-pose-estimation-on-freihand?p=wilor-end-to-end-3d-hand-localization-and)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wilor-end-to-end-3d-hand-localization-and/3d-hand-pose-estimation-on-ho-3d)](https://paperswithcode.com/sota/3d-hand-pose-estimation-on-ho-3d?p=wilor-end-to-end-3d-hand-localization-and)
+</div>
+This is the official implementation of **[WiLoR](https://rolpotamias.github.io/WiLoR/)**, an state-of-the-art hand localization and reconstruction model:
+![teaser](assets/teaser.png)
+## Installation
+### [Update] Quick Installation
+Thanks to [@warmshao](https://github.com/warmshao) WiLoR can now be installed using a single pip command:
+```
+pip install git+https://github.com/warmshao/WiLoR-mini
+```
+Please head to [WiLoR-mini](https://github.com/warmshao/WiLoR-mini) for additional details.
+**Note:** the above code is a simplified version of WiLoR and can be used for demo only.
+If you wish to use WiLoR for other tasks it is suggested to follow the original installation instructued bellow:
+### Original Installation
+```
+git clone --recursive https://github.com/rolpotamias/WiLoR.git
+cd WiLoR
+```
+The code has been tested with PyTorch 2.0.0 and CUDA 11.7. It is suggested to use an anaconda environment to install the the required dependencies:
+```bash
+conda create --name wilor python=3.10
+conda activate wilor
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117
+# Install requirements
+pip install -r requirements.txt
+```
+Download the pretrained models using:
+```bash
+wget https://huggingface.co/spaces/rolpotamias/WiLoR/resolve/main/pretrained_models/detector.pt -P ./pretrained_models/
+wget https://huggingface.co/spaces/rolpotamias/WiLoR/resolve/main/pretrained_models/wilor_final.ckpt -P ./pretrained_models/
+```
+It is also required to download MANO model from [MANO website](https://mano.is.tue.mpg.de).
+Create an account by clicking Sign Up and download the models (mano_v*_*.zip). Unzip and place the right hand model `MANO_RIGHT.pkl` under the `mano_data/` folder.
+Note that MANO model falls under the [MANO license](https://mano.is.tue.mpg.de/license.html).
+## Demo
+```bash
+python demo.py --img_folder demo_img --out_folder demo_out --save_mesh
+```
+## Start a local gradio demo
+You can start a local demo for inference by running:
+```bash
+python gradio_demo.py
+```
+## WHIM Dataset
+To download WHIM dataset please follow the instructions [here](./whim/Dataset_instructions.md)
+## Acknowledgements
+Parts of the code are taken or adapted from the following repos:
+- [HaMeR](https://github.com/geopavlakos/hamer/)
+- [Ultralytics](https://github.com/ultralytics/ultralytics)
+## License
+WiLoR models fall under the [CC-BY-NC--ND License](./license.txt). This repository depends also on [Ultralytics library](https://github.com/ultralytics/ultralytics) and [MANO Model](https://mano.is.tue.mpg.de/license.html), which are fall under their own licenses. By using this repository, you must also comply with the terms of these external licenses.
+## Citing
+If you find WiLoR useful for your research, please consider citing our paper:
+```bibtex
+@misc{potamias2024wilor,
+    title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
+    author={Rolandos Alexandros Potamias and Jinglei Zhang and Jiankang Deng and Stefanos Zafeiriou},
+    year={2024},
+    eprint={2409.12259},
+    archivePrefix={arXiv},
+    primaryClass={cs.CV}
+}
+```

WiLoR/demo.py ADDED Viewed

	@@ -0,0 +1,139 @@

+from pathlib import Path
+import torch
+import argparse
+import os
+import cv2
+import numpy as np
+import joblib  # 用于保存结果
+from typing import Dict, Optional
+from wilor.models import WiLoR, load_wilor
+from wilor.utils import recursive_to
+from wilor.datasets.vitdet_dataset import ViTDetDataset
+from wilor.utils.renderer import cam_crop_to_full  # 只保留这个数学计算函数
+# 移除 Renderer 导入，避免触发 OpenGL
+# from wilor.utils.renderer import Renderer
+from ultralytics import YOLO
+def main():
+    parser = argparse.ArgumentParser(description='WiLoR demo code (No Render)')
+    parser.add_argument('--img_folder', type=str, default=r'D:\SMPL-X_pose_extraction\demo\inputs', help='Folder with input images')
+    parser.add_argument('--out_folder', type=str, default=r'D:\SMPL-X_pose_extraction\demo\wilor_outputs', help='Output folder to save prediction results')
+    parser.add_argument('--rescale_factor', type=float, default=2.0, help='Factor for padding the bbox')
+    parser.add_argument('--file_type', nargs='+', default=['*.jpg', '*.png', '*.jpeg'], help='List of file extensions to consider')
+    args = parser.parse_args()
+    # 1. Load Checkpoints
+    print("Loading models...")
+    model, model_cfg = load_wilor(checkpoint_path=r'D:\SMPL-X_pose_extraction\pretrained_weight\wilor\wilor_final.ckpt', cfg_path='./pretrained_models/model_config.yaml')
+    detector = YOLO(r'D:\SMPL-X_pose_extraction\pretrained_weight\wilor\detector.pt')
+    # 2. Setup Device (No Renderer init here)
+    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+    model = model.to(device)
+    detector = detector.to(device)
+    model.eval()
+    # Make output directory
+    os.makedirs(args.out_folder, exist_ok=True)
+    # Get images
+    img_paths = [img for end in args.file_type for img in Path(args.img_folder).glob(end)]
+    print(f"Found {len(img_paths)} images.")
+    # Iterate over images
+    for img_path in img_paths:
+        print(f"Processing {img_path.name}...")
+        img_cv2 = cv2.imread(str(img_path))
+        # Detect hands
+        detections = detector(img_cv2, conf=0.3, verbose=False)[0]
+        bboxes = []
+        is_right = []
+        for det in detections:
+            Bbox = det.boxes.data.cpu().detach().squeeze().numpy()
+            is_right.append(det.boxes.cls.cpu().detach().squeeze().item())
+            bboxes.append(Bbox[:4].tolist())
+        if len(bboxes) == 0:
+            print(f"No hands detected in {img_path.name}")
+            continue
+        boxes = np.stack(bboxes)
+        right = np.stack(is_right)
+        # Create Dataset & Loader
+        dataset = ViTDetDataset(model_cfg, img_cv2, boxes, right, rescale_factor=args.rescale_factor)
+        dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=False, num_workers=0)
+        results_list = []
+        # Inference Loop
+        for batch in dataloader:
+            batch = recursive_to(batch, device)
+            with torch.no_grad():
+                out = model(batch)
+            # Post-process Camera Parameters
+            multiplier = (2*batch['right']-1)
+            pred_cam = out['pred_cam']
+            pred_cam[:,1] = multiplier*pred_cam[:,1]
+            box_center = batch["box_center"].float()
+            box_size = batch["box_size"].float()
+            img_size = batch["img_size"].float()
+            # Calculate focal length & full image camera translation
+            scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max()
+            pred_cam_t_full = cam_crop_to_full(pred_cam, box_center, box_size, img_size, scaled_focal_length).detach().cpu().numpy()
+            # Collect Results
+            batch_size_curr = batch['img'].shape[0]
+            for n in range(batch_size_curr):
+                verts = out['pred_vertices'][n].detach().cpu().numpy()
+                joints = out['pred_keypoints_3d'][n].detach().cpu().numpy()
+                # Correct orientation for left hands
+                is_right_curr = batch['right'][n].cpu().numpy()
+                verts[:, 0] = (2 * is_right_curr - 1) * verts[:, 0]
+                joints[:, 0] = (2 * is_right_curr - 1) * joints[:, 0]
+                cam_t = pred_cam_t_full[n]
+                # Store data needed for later visualization
+                hand_data = {
+                    'vertices': verts,              # [778, 3] mesh vertices
+                    'joints_3d': joints,            # [21, 3] 3D joints
+                    'cam_t': cam_t,                 # [3] Camera translation
+                    'focal_length': scaled_focal_length.cpu().item(),
+                    'is_right': int(is_right_curr), # 1 for right, 0 for left
+                    'img_res': img_size[n].cpu().numpy(),
+                    'faces': model.mano.faces       # MANO faces indices
+                }
+                results_list.append(hand_data)
+        # Save results to disk (PKL file)
+        if len(results_list) > 0:
+            img_fn, _ = os.path.splitext(os.path.basename(img_path))
+            save_path = os.path.join(args.out_folder, f'{img_fn}_results.pkl')
+            joblib.dump(results_list, save_path)
+            print(f"Saved results to {save_path}")
+def project_full_img(points, cam_trans, focal_length, img_res):
+    # 此函数保留，如果你想把2D投影点也存进去可以调用它
+    camera_center = [img_res[0] / 2., img_res[1] / 2.]
+    K = torch.eye(3)
+    K[0,0] = focal_length
+    K[1,1] = focal_length
+    K[0,2] = camera_center[0]
+    K[1,2] = camera_center[1]
+    points = points + cam_trans
+    points = points / points[..., -1:]
+    V_2d = (K @ points.T).T
+    return V_2d[..., :-1]
+if __name__ == '__main__':
+    main()

WiLoR/demo.sh ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ $env:PYOPENGL_PLATFORM = "wgl"
2	+ python demo.py

WiLoR/download_videos.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import os
+import json
+import numpy as np
+import argparse
+from pytubefix import YouTube
+parser = argparse.ArgumentParser()
+parser.add_argument("--root", type=str, help="Directory of WiLoR")
+parser.add_argument("--mode", type=str, choices=['train', 'test'], default= 'train', help="Train/Test set")
+args = parser.parse_args()
+with open(os.path.join(args.root, f'./whim/{args.mode}_video_ids.json')) as f:
+    video_dict = json.load(f)
+Video_IDs = video_dict.keys()
+failed_IDs = []
+os.makedirs(os.path.join(args.root, 'Videos'), exist_ok=True)
+for Video_ID in Video_IDs:
+    res = video_dict[Video_ID]['res'][0]
+    try:
+        YouTube('https://youtu.be/'+Video_ID).streams.filter(only_video=True,
+                                                             file_extension='mp4',
+                                                             res =f'{res}p'
+                                                             ).order_by('resolution').desc().first().download(
+                                                             output_path=os.path.join(args.root, 'Videos') ,
+                                                             filename = Video_ID +'.mp4')
+    except:
+        print(f'Failed {Video_ID}')
+        failed_IDs.append(Video_ID)
+        continue
+    cap = cv2.VideoCapture(os.path.join(args.root, 'Videos', Video_ID + '.mp4'))
+    if (cap.isOpened()== False):
+        print(f"Error opening video stream {os.path.join(args.root, 'Videos', Video_ID + '.mp4')}")
+    VIDEO_LEN = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+    length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+    fps    = cap.get(cv2.CAP_PROP_FPS)
+    fps_org = video_dict[Video_ID]['fps']
+    fps_rate = round(fps / fps_org)
+    all_frames   = os.listdir(os.path.join(args.root, 'WHIM', args.mode, 'anno', Video_ID))
+    for frame in all_frames:
+        frame_gt = int(frame[:-4])
+        frame_idx = (frame_gt * fps_rate)
+        cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
+        ret, img_cv2 = cap.read()
+        cv2.imwrite(os.path.join(args.root, 'WHIM', args.mode, 'anno', Video_ID, frame +'.jpg' ), img_cv2.astype(np.float32))
+np.save(os.path.join(args.root, 'failed_videos.npy'), failed_IDs)

WiLoR/gradio_demo.py ADDED Viewed

	@@ -0,0 +1,192 @@

+import os
+import sys
+os.environ["PYOPENGL_PLATFORM"] = "egl"
+os.environ["MESA_GL_VERSION_OVERRIDE"] = "4.1"
+# os.system('pip install /home/user/app/pyrender')
+# sys.path.append('/home/user/app/pyrender')
+import gradio as gr
+#import spaces
+import cv2
+import numpy as np
+import torch
+from ultralytics import YOLO
+from pathlib import Path
+import argparse
+import json
+from typing import Dict, Optional
+from wilor.models import WiLoR, load_wilor
+from wilor.utils import recursive_to
+from wilor.datasets.vitdet_dataset import ViTDetDataset, DEFAULT_MEAN, DEFAULT_STD
+from wilor.utils.renderer import Renderer, cam_crop_to_full
+device = torch.device('cpu') if torch.cuda.is_available() else torch.device('cuda')
+LIGHT_PURPLE=(0.25098039,  0.274117647,  0.65882353)
+model, model_cfg = load_wilor(checkpoint_path = './pretrained_models/wilor_final.ckpt' , cfg_path= './pretrained_models/model_config.yaml')
+# Setup the renderer
+renderer = Renderer(model_cfg, faces=model.mano.faces)
+model = model.to(device)
+model.eval()
+detector = YOLO(f'./pretrained_models/detector.pt').to(device)
+def render_reconstruction(image, conf, IoU_threshold=0.3):
+    input_img, num_dets, reconstructions = run_wilow_model(image, conf, IoU_threshold=0.5)
+    if num_dets> 0:
+    # Render front view
+        misc_args = dict(
+            mesh_base_color=LIGHT_PURPLE,
+            scene_bg_color=(1, 1, 1),
+            focal_length=reconstructions['focal'],
+        )
+        cam_view = renderer.render_rgba_multiple(reconstructions['verts'],
+                                                 cam_t=reconstructions['cam_t'],
+                                                 render_res=reconstructions['img_size'],
+                                                 is_right=reconstructions['right'], **misc_args)
+        # Overlay image
+        input_img = np.concatenate([input_img, np.ones_like(input_img[:,:,:1])], axis=2) # Add alpha channel
+        input_img_overlay = input_img[:,:,:3] * (1-cam_view[:,:,3:]) + cam_view[:,:,:3] * cam_view[:,:,3:]
+        return input_img_overlay, f'{num_dets} hands detected'
+    else:
+        return input_img, f'{num_dets} hands detected'
+#@spaces.GPU()
+def run_wilow_model(image, conf, IoU_threshold=0.5):
+    img_cv2 = image[...,::-1]
+    img_vis = image.copy()
+    detections = detector(img_cv2, conf=conf, verbose=False, iou=IoU_threshold)[0]
+    bboxes    = []
+    is_right  = []
+    for det in detections:
+        Bbox = det.boxes.data.cpu().detach().squeeze().numpy()
+        Conf = det.boxes.conf.data.cpu().detach()[0].numpy().reshape(-1).astype(np.float16)
+        Side = det.boxes.cls.data.cpu().detach()
+        #Bbox[:2] -= np.int32(0.1 * Bbox[:2])
+        #Bbox[2:] += np.int32(0.1 * Bbox[ 2:])
+        is_right.append(det.boxes.cls.cpu().detach().squeeze().item())
+        bboxes.append(Bbox[:4].tolist())
+        color = (255*0.208, 255*0.647 ,255*0.603 ) if Side==0. else (255*1, 255*0.78039, 255*0.2353)
+        label = f'L - {Conf[0]:.3f}' if Side==0 else f'R - {Conf[0]:.3f}'
+        cv2.rectangle(img_vis, (int(Bbox[0]), int(Bbox[1])), (int(Bbox[2]), int(Bbox[3])), color , 3)
+        (w, h), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)
+        cv2.rectangle(img_vis, (int(Bbox[0]), int(Bbox[1]) - 20), (int(Bbox[0]) + w, int(Bbox[1])), color, -1)
+        cv2.putText(img_vis, label, (int(Bbox[0]), int(Bbox[1]) - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,0,0), 2)
+    if len(bboxes) != 0:
+        boxes = np.stack(bboxes)
+        right = np.stack(is_right)
+        dataset = ViTDetDataset(model_cfg, img_cv2, boxes, right, rescale_factor=2.0 )
+        dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=False, num_workers=0)
+        all_verts = []
+        all_cam_t = []
+        all_right = []
+        all_joints= []
+        for batch in dataloader:
+            batch = recursive_to(batch, device)
+            with torch.no_grad():
+                out = model(batch)
+            multiplier    = (2*batch['right']-1)
+            pred_cam      = out['pred_cam']
+            pred_cam[:,1] = multiplier*pred_cam[:,1]
+            box_center    = batch["box_center"].float()
+            box_size      = batch["box_size"].float()
+            img_size      = batch["img_size"].float()
+            scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max()
+            pred_cam_t_full     = cam_crop_to_full(pred_cam, box_center, box_size, img_size, scaled_focal_length).detach().cpu().numpy()
+            batch_size = batch['img'].shape[0]
+            for n in range(batch_size):
+                verts  = out['pred_vertices'][n].detach().cpu().numpy()
+                joints = out['pred_keypoints_3d'][n].detach().cpu().numpy()
+                is_right = batch['right'][n].cpu().numpy()
+                verts[:,0] = (2*is_right-1)*verts[:,0]
+                joints[:,0] = (2*is_right-1)*joints[:,0]
+                cam_t = pred_cam_t_full[n]
+                all_verts.append(verts)
+                all_cam_t.append(cam_t)
+                all_right.append(is_right)
+                all_joints.append(joints)
+        reconstructions = {'verts': all_verts, 'cam_t': all_cam_t, 'right': all_right, 'img_size': img_size[n], 'focal': scaled_focal_length}
+        return img_vis.astype(np.float32)/255.0, len(detections), reconstructions
+    else:
+        return img_vis.astype(np.float32)/255.0, len(detections), None
+header = ('''
+<div class="embed_hidden" style="text-align: center;">
+    <h1> <b>WiLoR</b>: End-to-end 3D hand localization and reconstruction in-the-wild</h1>
+    <h3>
+        <a href="https://rolpotamias.github.io" target="_blank" rel="noopener noreferrer">Rolandos Alexandros Potamias</a><sup>1</sup>,
+        <a href="" target="_blank" rel="noopener noreferrer">Jinglei Zhang</a><sup>2</sup>,
+        <br>
+        <a href="https://jiankangdeng.github.io/" target="_blank" rel="noopener noreferrer">Jiankang Deng</a><sup>1</sup>,
+        <a href="https://wp.doc.ic.ac.uk/szafeiri/" target="_blank" rel="noopener noreferrer">Stefanos Zafeiriou</a><sup>1</sup>
+    </h3>
+    <h3>
+        <sup>1</sup>Imperial College London;
+        <sup>2</sup>Shanghai Jiao Tong University
+    </h3>
+</div>
+<div style="display:flex; gap: 0.3rem; justify-content: center; align-items: center;" align="center">
+<a href=''><img src='https://img.shields.io/badge/Arxiv-......-A42C25?style=flat&logo=arXiv&logoColor=A42C25'></a>
+<a href='https://rolpotamias.github.io/pdfs/WiLoR.pdf'><img src='https://img.shields.io/badge/Paper-PDF-yellow?style=flat&logo=arXiv&logoColor=yellow'></a>
+<a href='https://rolpotamias.github.io/WiLoR/'><img src='https://img.shields.io/badge/Project-Page-%23df5b46?style=flat&logo=Google%20chrome&logoColor=%23df5b46'></a>
+<a href='https://github.com/rolpotamias/WiLoR'><img src='https://img.shields.io/badge/GitHub-Code-black?style=flat&logo=github&logoColor=white'></a>
+''')
+with gr.Blocks(title="WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild", css=".gradio-container") as demo:
+    gr.Markdown(header)
+    with gr.Row():
+        with gr.Column():
+            input_image = gr.Image(label="Input image", type="numpy")
+            threshold = gr.Slider(value=0.3, minimum=0.05, maximum=0.95, step=0.05, label='Detection Confidence Threshold')
+            #nms = gr.Slider(value=0.5, minimum=0.05, maximum=0.95, step=0.05, label='IoU NMS Threshold')
+            submit = gr.Button("Submit", variant="primary")
+        with gr.Column():
+            reconstruction = gr.Image(label="Reconstructions", type="numpy")
+            hands_detected = gr.Textbox(label="Hands Detected")
+        submit.click(fn=render_reconstruction, inputs=[input_image, threshold], outputs=[reconstruction, hands_detected])
+    with gr.Row():
+        example_images = gr.Examples([
+            ['./demo_img/test1.jpg'],
+            ['./demo_img/test2.png'],
+            ['./demo_img/test3.jpg'],
+            ['./demo_img/test4.jpg'],
+            ['./demo_img/test5.jpeg'],
+            ['./demo_img/test6.jpg'],
+            ['./demo_img/test7.jpg'],
+            ['./demo_img/test8.jpg'],
+            ],
+            inputs=input_image)
+demo.launch()

WiLoR/license.txt ADDED Viewed

	@@ -0,0 +1,402 @@

+Attribution-NonCommercial-NoDerivatives 4.0 International
+=======================================================================
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+Using Creative Commons Public Licenses
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+    wiki.creativecommons.org/Considerations_for_licensors
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More considerations
+     for the public:
+    wiki.creativecommons.org/Considerations_for_licensees
+=======================================================================
+Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
+International Public License
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial-NoDerivatives 4.0 International Public
+License ("Public License"). To the extent this Public License may be
+interpreted as a contract, You are granted the Licensed Rights in
+consideration of Your acceptance of these terms and conditions, and the
+Licensor grants You such rights in consideration of benefits the
+Licensor receives from making the Licensed Material available under
+these terms and conditions.
+Section 1 -- Definitions.
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+  b. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+  c. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+  d. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+  e. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+  f. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+  g. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+  h. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+  i. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+  j. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+  k. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+Section 2 -- Scope.
+  a. License grant.
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+            b. produce and reproduce, but not Share, Adapted Material
+               for NonCommercial purposes only.
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+       5. Downstream recipients.
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+            b. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+  b. Other rights.
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+Section 3 -- License Conditions.
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+  a. Attribution.
+       1. If You Share the Licensed Material, You must:
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+                ii. a copyright notice;
+               iii. a notice that refers to this Public License;
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+          For the avoidance of doubt, You do not have permission under
+          this Public License to Share Adapted Material.
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+Section 4 -- Sui Generis Database Rights.
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only and provided You do not Share Adapted Material;
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material; and
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+Section 6 -- Term and Termination.
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+       2. upon express reinstatement by the Licensor.
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+Section 7 -- Other Terms and Conditions.
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+Section 8 -- Interpretation.
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+=======================================================================
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the â€œLicensor.â€ The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+Creative Commons may be contacted at creativecommons.org.

WiLoR/requirements.txt ADDED Viewed

	@@ -0,0 +1,20 @@

+numpy
+opencv-python
+pyrender
+pytorch-lightning
+scikit-image
+smplx==0.1.28
+yacs
+chumpy @ git+https://github.com/mattloper/chumpy
+timm
+einops
+xtcocotools
+pandas
+hydra-core
+hydra-submitit-launcher
+hydra-colorlog
+pyrootutils
+rich
+webdataset
+gradio
+ultralytics==8.1.34

WiLoR/requirements_my.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+pytorch-lightning
+scikit-image
+yacs
+xtcocotools
+hydra-core
+hydra-submitit-launcher
+hydra-colorlog
+pyrootutils
+rich
+webdataset
+gradio

__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+def deprecated_api_warning(name, cls=None):
+    def decorator(func):
+        return func
+    return decorator
+def deprecated_api_warning(name, cls=None):
+    def decorator(func):
+        return func
+    return decorator

convert_img_to_videos.py ADDED Viewed

	@@ -0,0 +1,90 @@

+import os
+import cv2
+from pathlib import Path
+from tqdm import tqdm
+def create_video_from_images(image_folder, output_video_path, fps=30):
+    """
+    Create a video from a folder of images
+    Args:
+        image_folder: Path to folder containing images
+        output_video_path: Path where video will be saved
+        fps: Frames per second for the output video
+    """
+    # Get all image files and sort them
+    image_files = sorted([f for f in os.listdir(image_folder) if f.endswith(('.jpg', '.png', '.jpeg'))])
+    if not image_files:
+        print(f"No images found in {image_folder}")
+        return False
+    # Read first image to get dimensions
+    first_image_path = os.path.join(image_folder, image_files[0])
+    first_frame = cv2.imread(first_image_path)
+    if first_frame is None:
+        print(f"Could not read {first_image_path}")
+        return False
+    height, width, channels = first_frame.shape
+    # Define the codec and create VideoWriter object
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # or 'XVID' for .avi
+    out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))
+    # Write each frame
+    for image_file in image_files:
+        image_path = os.path.join(image_folder, image_file)
+        frame = cv2.imread(image_path)
+        if frame is not None:
+            out.write(frame)
+        else:
+            print(f"Warning: Could not read {image_path}")
+    # Release the video writer
+    out.release()
+    print(f"Video saved to {output_video_path}")
+    return True
+def convert_all_folders_to_videos(input_base_dir, output_base_dir, fps=30):
+    """
+    Convert all image folders to videos
+    Args:
+        input_base_dir: Base directory containing image folders
+        output_base_dir: Base directory where videos will be saved
+        fps: Frames per second for output videos
+    """
+    input_path = Path(input_base_dir)
+    output_path = Path(output_base_dir)
+    # Create output directory if it doesn't exist
+    output_path.mkdir(parents=True, exist_ok=True)
+    # Get all subdirectories in the input directory
+    folders = [f for f in input_path.iterdir() if f.is_dir()]
+    print(f"Found {len(folders)} folders to convert")
+    # Process each folder
+    for folder in tqdm(folders, desc="Converting folders to videos"):
+        folder_name = folder.name
+        output_video_path = output_path / f"{folder_name}.mp4"
+        print(f"\nProcessing: {folder_name}")
+        create_video_from_images(str(folder), str(output_video_path), fps=fps)
+    print(f"\n✓ All videos saved to {output_base_dir}")
+if __name__ == "__main__":
+    # Set your paths
+    input_base_dir = "/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_Daily/rgb_format/frames_512x512"
+    output_base_dir = "/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_Daily/rgb_format/videos_512x512_30fps"
+    fps = 30
+    print("Starting conversion...")
+    convert_all_folders_to_videos(input_base_dir, output_base_dir, fps=fps)
+    print("Done!")

corrupted_videos.log ADDED Viewed

	@@ -0,0 +1,7 @@

+2 weeks for cslnews
+/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20220416_22737-23037_691030.mp4
+/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20210613_37012-37137_564952.mp4
+/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20240121_23537-23987_637834.mp4
+1 hr for csldaily

corrupted_videos_csl_news.log ADDED Viewed

	@@ -0,0 +1,7 @@

+2 weeks for csl news
+/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20220416_22737-23037_691030.mp4
+/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20210613_37012-37137_564952.mp4
+/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20240121_23537-23987_637834.mp4

extract_smplx_20260212_165824.log ADDED Viewed

The diff for this file is too large to render. See raw diff

extract_smplx_20260212_165911_gpu_monitor.log ADDED Viewed

The diff for this file is too large to render. See raw diff

extract_smplx_20260213_144424.log ADDED Viewed

The diff for this file is too large to render. See raw diff

extract_smplx_20260213_144424_gpu_monitor.log ADDED Viewed

The diff for this file is too large to render. See raw diff

extract_smplx_pose.py ADDED Viewed

	@@ -0,0 +1,657 @@

+import warnings
+import os
+import sys
+import argparse
+import torch
+import cv2
+import pickle
+import smplx
+import numpy as np
+import time
+import queue
+import threading
+import torch.nn.functional as F
+from tqdm import tqdm
+from torchvision import transforms
+from ultralytics import YOLO
+from accelerate import Accelerator
+from accelerate.utils import set_seed
+from concurrent.futures import ThreadPoolExecutor
+import decord
+from decord import VideoReader, gpu
+torch.set_float32_matmul_precision('high')
+torch.backends.cuda.matmul.allow_tf32 = True
+torch.backends.cudnn.allow_tf32 = True
+torch.backends.cudnn.benchmark = True
+torch._inductor.config.triton.cudagraph_skip_dynamic_graphs = True
+warnings.filterwarnings("ignore")
+torch._inductor.config.triton.cudagraph_skip_dynamic_graphs = True
+import logging
+logging.getLogger("torch.utils._sympy.interp").setLevel(logging.ERROR)
+logging.getLogger("torch._inductor.utils").setLevel(logging.ERROR)
+PROJECT_ROOT = os.path.abspath(r"/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction")
+SMPLEST_X_PATH = os.path.join(PROJECT_ROOT, 'SMPLest-X')
+WILOR_PATH = os.path.join(PROJECT_ROOT, 'WiLoR')
+DEPTH_ANYTHING_PATH = os.path.join(PROJECT_ROOT, 'Depth-Anything-V2')
+MODEL_PATH = os.path.join(PROJECT_ROOT, "pretrained_weight", "smpl_models")
+for p in [SMPLEST_X_PATH, WILOR_PATH, DEPTH_ANYTHING_PATH, PROJECT_ROOT]:
+    if p not in sys.path: sys.path.insert(0, p)
+for attr in ['int', 'float', 'bool', 'complex', 'object', 'unicode', 'str']:
+    if not hasattr(np, attr): setattr(np, attr, eval(attr) if attr != 'unicode' else str)
+from main.config import Config as SmplestConfig
+from main.base import Tester as SmplestTester
+from human_models.human_models import SMPLX
+from utils.data_utils import process_bbox, generate_patch_image
+from wilor.models import load_wilor
+from wilor.utils import recursive_to
+from wilor.datasets.vitdet_dataset import ViTDetDataset
+from depth_anything_v2.dpt import DepthAnythingV2
+from basicsr.archs.rrdbnet_arch import RRDBNet
+from realesrgan.utils import RealESRGANer
+class FramePrefetcher:
+    def __init__(self, video_path, device_id=0, buffer_size=128):
+        try:
+            self.vr = VideoReader(video_path, ctx=gpu(device_id))
+        except:
+            self.vr = VideoReader(video_path, ctx=decord.cpu(0))
+        self.total_frames = len(self.vr)
+        self.current_idx = 0
+        self.buffer_size = buffer_size
+        self.queue = queue.Queue(maxsize=buffer_size)
+        self.stopped = False
+    def start(self):
+        t = threading.Thread(target=self._update, args=())
+        t.daemon = True
+        t.start()
+        return self
+    def _update(self):
+        while not self.stopped:
+            if self.current_idx >= self.total_frames:
+                self.stopped = True
+                break
+            if not self.queue.full():
+                end_idx = min(self.current_idx + 16, self.total_frames)
+                frames = self.vr.get_batch(range(self.current_idx, end_idx)).asnumpy()
+                for i in range(frames.shape[0]):
+                    frame = cv2.cvtColor(frames[i], cv2.COLOR_RGB2BGR)
+                    self.queue.put(frame)
+                self.current_idx = end_idx
+            else:
+                time.sleep(0.005)
+    def get_batch(self, batch_size):
+        batch = []
+        for _ in range(batch_size):
+            try:
+                frame = self.queue.get(timeout=0.05)
+                batch.append(frame)
+            except queue.Empty:
+                break
+        return batch
+    def is_running(self):
+        return not (self.stopped and self.queue.empty())
+    def stop(self):
+        self.stopped = True
+class GlobalSilence:
+    def __enter__(self):
+        self.stdout_fd = sys.stdout.fileno()
+        self.saved_stdout_fd = os.dup(self.stdout_fd)
+        os.dup2(os.open(os.devnull, os.O_WRONLY), self.stdout_fd)
+    def __exit__(self, type, value, traceback):
+        os.dup2(self.saved_stdout_fd, self.stdout_fd)
+        os.close(self.saved_stdout_fd)
+class SMPLXPoseExtractor:
+    def __init__(self, args, accelerator):
+        self.args = args
+        self.accelerator = accelerator
+        self.device = accelerator.device
+        self.global_pbar = None
+        self.files_done = 0
+        self.my_total = 0
+        self.start_time = 0
+        self.pool = ThreadPoolExecutor(max_workers=self.args.num_workers)
+        if self.accelerator.is_main_process:
+            print(f"Initializing SMPL-X Pose Extractor on {self.device}...")
+        # 1. Load Real-ESRGAN
+        if self.args.apply_sr:
+            model_esrgan = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
+            self.upsampler = RealESRGANer(scale=4, model_path=args.real_esrgan_ckpt, model=model_esrgan, tile=1024, tile_pad=10, pre_pad=0, half=True, device=self.device)
+            if self.accelerator.is_main_process: print("[1/6] Real-ESRGAN loaded.")
+        else:
+            self.upsampler = None
+            if self.accelerator.is_main_process: print("[1/6] Real-ESRGAN skipped.")
+        # 2. Load Depth Anything V2
+        if self.args.opt_depth:
+            self.depth_model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])
+            self.depth_model.load_state_dict(torch.load(args.depth_anything_v2_ckpt, map_location='cpu'))
+            self.depth_model = self.depth_model.to(self.device).eval()
+            if hasattr(torch, 'compile'):
+                self.depth_model = torch.compile(self.depth_model, mode="reduce-overhead")
+                if self.accelerator.is_main_process: print("[2/6] Compiled Depth-Anything-V2 loaded for optimization.")
+            else:
+                if self.accelerator.is_main_process: print("[2/6] Compiled Depth-Anything-V2 loaded for optimization.")
+        else:
+            self.depth_model = None
+            if self.accelerator.is_main_process: print("[2/6] Depth optimization skipped.")
+        # 3. Load YOLO Detectors
+        self.detector = YOLO(args.yolo_ckpt)
+        self.hand_detector = YOLO(args.hand_detector_ckpt)
+        if self.accelerator.is_main_process: print("[3/6] YOLO (Body & Hand) Detectors loaded successfully.")
+        # 4. SMPLest
+        self.smplest_cfg = SmplestConfig.load_config(os.path.join(PROJECT_ROOT, "pretrained_weight", "smplest-x", "config_base.py"))
+        log_base = os.path.join(PROJECT_ROOT, "smplest_logs")
+        if self.accelerator.is_main_process: os.makedirs(log_base, exist_ok=True)
+        self.accelerator.wait_for_everyone()
+        self.smplest_cfg.log.log_dir = os.path.join(log_base, f"rank_{self.accelerator.process_index}")
+        os.makedirs(self.smplest_cfg.log.log_dir, exist_ok=True)
+        _tmp_wrapper = SMPLX(MODEL_PATH)
+        self.smplx_model = _tmp_wrapper.layer['neutral'].to(self.device)
+        self.smplest_tester = SmplestTester(self.smplest_cfg)
+        self.smplest_tester._make_model()
+        self.smplest_model = self.smplest_tester.model.to(self.device)
+        if isinstance(self.smplest_model, torch.nn.DataParallel): self.smplest_model = self.smplest_model.module
+        self.smplest_model = self.smplest_model.to(self.device).eval()
+        if hasattr(torch, 'compile'):
+            if self.accelerator.is_main_process: print("[4/6] Compiled SMPLest-X model loaded successfully.")
+            self.smplest_model = torch.compile(self.smplest_model, mode="reduce-overhead")
+        else:
+            if self.accelerator.is_main_process: print("[4/6] SMPLest-X model loaded successfully.")
+        # 5. WiLoR
+        self.wilor_model, self.wilor_cfg = load_wilor(args.wilor_ckpt, cfg_path=os.path.join(PROJECT_ROOT, 'WiLoR', 'pretrained_models', 'model_config.yaml'))
+        self.wilor_model = self.wilor_model.to(self.device).eval()
+        self.transform = transforms.ToTensor()
+        if hasattr(torch, 'compile'):
+            self.wilor_model = torch.compile(self.wilor_model, mode="reduce-overhead")
+            if self.accelerator.is_main_process: print("[5/6] Compiled WiLoR Hand model loaded successfully.")
+        else:
+            if self.accelerator.is_main_process: print("[5/6] WiLoR Hand model loaded successfully.")
+        # 6. Optimization Layer
+        self.smplx_opt = smplx.create(MODEL_PATH, model_type='smplx', gender='neutral', use_pca=False, batch_size=1).to(self.device)
+        self.smpl_mean_r = self.smplx_model.right_hand_mean.detach().to(torch.float32).cpu().numpy().flatten()
+        self.smpl_mean_l = self.smplx_model.left_hand_mean.detach().to(torch.float32).cpu().numpy().flatten()
+        if self.accelerator.is_main_process: print("[6/6] SMPL-X Optimization Layer initialized.\n")
+    def _matrix_to_axis_angle(self, matrix_tensor):
+        mats = matrix_tensor.detach().to(torch.float32).cpu().numpy()
+        if mats.ndim == 2: mats = mats[None]
+        return np.concatenate([cv2.Rodrigues(mats[i])[0].flatten() for i in range(mats.shape[0])])
+    def _batch_refine_fingers(self, hand_poses, hand_bboxes, depths_gpu, joints_2d_all):
+        N_hp = hand_poses.shape[0]
+        N_jt = joints_2d_all.shape[0]
+        # Align and remove padding
+        N = N_jt
+        if N == 0: return hand_poses
+        if N_hp != N:
+            hand_poses = hand_poses[:N]
+            hand_bboxes = hand_bboxes[:N]
+        # 1. Get BBox
+        x1, y1 = hand_bboxes[:, 0:1], hand_bboxes[:, 1:2]
+        w_crop, h_crop = hand_bboxes[:, 2:3] - x1, hand_bboxes[:, 3:4] - y1
+        # 2. Get 2d
+        joints_img = joints_2d_all.clone()
+        joints_img[:, :, 0] = joints_img[:, :, 0] * w_crop + x1
+        joints_img[:, :, 1] = joints_img[:, :, 1] * h_crop + y1
+        # 3. Get depth
+        img_h, img_w = depths_gpu.shape[-2:]
+        u = joints_img[:, :, 0].long().clamp(0, img_w - 1)
+        v = joints_img[:, :, 1].long().clamp(0, img_h - 1)
+        hand_idx = torch.arange(N, device=hand_poses.device).view(-1, 1)
+        depth_values = depths_gpu[hand_idx, v, u]
+        # 4. compute to refine Z
+        palm_indices = [0, 5, 9, 13, 17]
+        palm_depth = depth_values[:, palm_indices].mean(dim=1, keepdim=True)
+        finger_depths = depth_values[:, 5:20]
+        z_mod = ((finger_depths - palm_depth) / (palm_depth + 1e-6)).clamp(-0.2, 0.2) + 1.0
+        D = hand_poses.shape[1] // 15
+        refined_pose = hand_poses.view(N, 15, D).clone()
+        refined_pose *= z_mod.unsqueeze(-1)
+        return refined_pose.view(N, -1)
+    def _extract_body_patch(self, i, img, body_res):
+        if not body_res.boxes:
+            return i, None
+        h_img, w_img = img.shape[:2]
+        bbox = body_res.boxes.xyxy[0].to(torch.float32).cpu().numpy()
+        proc_bbox = process_bbox(np.array([bbox[0], bbox[1], bbox[2]-bbox[0], bbox[3]-bbox[1]]),
+                                 w_img, h_img, self.smplest_cfg.model.input_img_shape)
+        patch_img, _, _ = generate_patch_image(img, proc_bbox, 1.0, 0.0, False, self.smplest_cfg.model.input_img_shape)
+        p_tensor = self.transform(cv2.cvtColor(patch_img, cv2.COLOR_BGR2RGB).astype(np.float32)) / 255.0
+        return i, p_tensor
+    def _extract_hand_patches(self, i, img, hand_res):
+        patches = []
+        if not hand_res.boxes:
+            return i, patches
+        for box in hand_res.boxes:
+            is_right = box.cls.cpu().item()
+            bbox = box.xyxy[0].to(torch.float32).cpu().numpy()
+            patch_tensor = self._get_wilor_patch(img, bbox, is_right)
+            patches.append({
+                'tensor': patch_tensor,
+                'meta': {'batch_idx': i, 'is_right': is_right, 'bbox': bbox}
+            })
+        return i, patches
+    def _get_wilor_patch(self, img, bbox, is_right):
+        h_img, w_img = img.shape[:2]
+        x1, y1, x2, y2 = bbox
+        # 1. Compute the center point and the original width and height
+        center = np.array([(x1 + x2) / 2.0, (y1 + y2) / 2.0])
+        width = x2 - x1
+        height = y2 - y1
+        # 2. Rescale factor (WiLoR / ViTDet typically uses 2.0)
+        # This mimics the behavior of ViTDetDataset
+        rescale_factor = 2.0
+        side = max(width, height) * rescale_factor
+        # 3. Compute the cropping boundaries
+        # Ensure the crop is square and stays within image bounds
+        new_x1 = max(0, int(center[0] - side / 2.0))
+        new_y1 = max(0, int(center[1] - side / 2.0))
+        new_x2 = min(w_img, int(center[0] + side / 2.0))
+        new_y2 = min(h_img, int(center[1] + side / 2.0))
+        # 4. Crop the image and pad it to a square
+        patch = img[new_y1:new_y2, new_x1:new_x2]
+        # If the cropped patch is not square (e.g., near image borders), pad it
+        ph, pw = patch.shape[:2]
+        if ph != pw:
+            max_side = max(ph, pw)
+            # Create a black background
+            tmp_patch = np.zeros((max_side, max_side, 3), dtype=np.uint8)
+            # Paste the patch into the center
+            start_y = (max_side - ph) // 2
+            start_x = (max_side - pw) // 2
+            tmp_patch[start_y:start_y+ph, start_x:start_x+pw] = patch
+            patch = tmp_patch
+        # 5. Resize to the model input resolution
+        # (assumed to be 224x224; adjust according to wilor_cfg)
+        input_size = self.wilor_cfg.MODEL.IMAGE_SIZE
+        patch_rgb = cv2.cvtColor(patch, cv2.COLOR_BGR2RGB)
+        patch_resized = cv2.resize(patch_rgb, (input_size, input_size), interpolation=cv2.INTER_LINEAR)
+        patch_tensor = torch.from_numpy(patch_resized).float().permute(2, 0, 1) / 255.0
+        return patch_tensor
+    def pad_to_fixed_buckets(self, tensor, buckets=[32, 64, 128, 256, 512]):
+        n = tensor.shape[0]
+        target_n = n
+        for b in buckets:
+            if n <= b:
+                target_n = b
+                break
+        else:
+            target_n = ((n + 7) // 8) * 8
+        if target_n == n:
+            return tensor, n
+        pad_size = target_n - n
+        padding = torch.zeros((pad_size, *tensor.shape[1:]), device=tensor.device, dtype=tensor.dtype)
+        return torch.cat([tensor, padding], dim=0), n
+    def _process_batch(self, batch_imgs):
+        batch_results_data = []
+        with torch.cuda.amp.autocast(dtype=torch.bfloat16):
+            # =========================================================
+            # STAGE 0: Image enhancement and depth preprocessing
+            # =========================================================
+            if self.args.apply_sr and self.upsampler:
+                processed_imgs = []
+                with GlobalSilence():
+                    for img in batch_imgs:
+                        sr_img, _ = self.upsampler.enhance(img, outscale=3); processed_imgs.append(sr_img)
+            else:
+                imgs_np = np.stack(batch_imgs)
+                processed_imgs = batch_imgs
+            imgs_np = np.stack(processed_imgs)
+            imgs_gpu = torch.from_numpy(imgs_np).to(self.device, non_blocking=True).float() / 255.0
+            imgs_gpu = imgs_gpu.permute(0, 3, 1, 2)  # [B, 3, H, W]
+            h_orig, w_orig = imgs_gpu.shape[2:]
+            new_h = ((h_orig + 223) // 224) * 224
+            new_w = ((w_orig + 223) // 224) * 224
+            if h_orig != new_h or w_orig != new_w:
+                imgs_gpu = F.interpolate(imgs_gpu, size=(new_h, new_w), mode='bilinear', align_corners=False)
+            batch_depths_np = None
+            if self.args.opt_depth:
+                depth_input = imgs_gpu.to(memory_format=torch.contiguous_format).contiguous()
+                depths = self.depth_model(depth_input)
+                if depths.shape[-2:] != (h_orig, w_orig):
+                    depths = F.interpolate(depths.unsqueeze(1), size=(h_orig, w_orig), mode='bilinear').squeeze(1)
+                batch_depths_np = depths.detach().cpu().numpy()
+            # =========================================================
+            # STAGE 1: Batched object detection and patch collection
+            # =========================================================
+            body_results = self.detector.predict(processed_imgs, device=self.device, conf=0.5, classes=0, verbose=False)
+            hand_results = self.hand_detector.predict(processed_imgs, device=self.device, conf=0.3, verbose=False)
+            # =========================================================
+            # STAGE 2: Aggregated SMPLest batch inference (key optimization)
+            # =========================================================
+            # 1. Use thread pool to process body patches in parallel
+            body_tasks = [self.pool.submit(self._extract_body_patch, i, processed_imgs[i], body_results[i])
+                        for i in range(len(processed_imgs))]
+            # 2. Use thread pool to process hand patches in parallel
+            hand_tasks = [self.pool.submit(self._extract_hand_patches, i, processed_imgs[i], hand_results[i])
+                        for i in range(len(processed_imgs))]
+            # 3. Collect body results
+            smpl_patch_tensors, original_to_agg_idx = [], {}
+            for task in body_tasks:
+                i, p_tensor = task.result()
+                if p_tensor is not None:
+                    original_to_agg_idx[i] = len(smpl_patch_tensors)
+                    smpl_patch_tensors.append(p_tensor)
+            # 4. Collect hand results
+            all_hand_patches, hand_meta_map = [], []
+            for task in hand_tasks:
+                _, patches = task.result()
+                for p_data in patches:
+                    all_hand_patches.append(p_data['tensor'])
+                    hand_meta_map.append(p_data['meta'])
+            batch_out_smpl = None
+            if smpl_patch_tensors:
+                agg_smpl_tensor = torch.stack(smpl_patch_tensors).to(self.device, dtype=torch.bfloat16)#.half()
+                padded_tensor, actual_num = self.pad_to_fixed_buckets(agg_smpl_tensor)
+                with torch.no_grad():
+                    raw_smpl = self.smplest_model({'img': padded_tensor}, {}, {}, 'test')
+                    batch_out_smpl = {k: v[:actual_num] if isinstance(v, torch.Tensor) else v for k, v in raw_smpl.items()}
+            wilor_out_all = None
+            if all_hand_patches:
+                agg_hand_tensor = torch.stack(all_hand_patches).to(self.device, dtype=torch.bfloat16)
+                padded_hand, actual_hand_num = self.pad_to_fixed_buckets(agg_hand_tensor)
+                with torch.no_grad():
+                    raw_wilor = self.wilor_model({'img': padded_hand})
+                    wilor_out_all = {}
+                    for k, v in raw_wilor.items():
+                        if isinstance(v, torch.Tensor):
+                            wilor_out_all[k] = v[:actual_hand_num].clone()
+                        elif isinstance(v, dict):
+                            wilor_out_all[k] = {
+                                nk: nv[:actual_hand_num].clone() if isinstance(nv, torch.Tensor) else nv
+                                for nk, nv in v.items()
+                            }
+                        else:
+                            wilor_out_all[k] = v
+            # =========================================================
+            # STAGE 3: Depth-based refinement
+            # =========================================================
+            if wilor_out_all and self.args.opt_depth and len(hand_meta_map) > 0:
+                hand_params = wilor_out_all['pred_mano_params']
+                all_hp = hand_params['hand_pose']
+                orig_shape_suffix = all_hp.shape[1:]
+                # The current number of hands
+                N = all_hp.shape[0]
+                # Mapping logic remains unchanged ...
+                raw_indices = [m['batch_idx'] for m in hand_meta_map]
+                unique_raw_indices = sorted(list(set(raw_indices)))
+                idx_map = {raw_idx: local_idx for local_idx, raw_idx in enumerate(unique_raw_indices)}
+                target_indices = torch.tensor([idx_map[idx] for idx in raw_indices], device=self.device)
+                # For safety, ensure index count does not exceed hp count
+                target_indices = target_indices[:N]
+                all_joints = wilor_out_all['pred_keypoints_2d'][target_indices]
+                target_depth_tensors = depths[target_indices]
+                refined_poses = self._batch_refine_fingers(
+                    all_hp.reshape(N, -1),
+                    torch.tensor([m['bbox'] for m in hand_meta_map], device=self.device)[:N],
+                    target_depth_tensors,
+                    all_joints
+                )
+                wilor_out_all['pred_mano_params']['hand_pose'] = refined_poses.reshape(N, *orig_shape_suffix)
+            # =========================================================
+            # STAGE 4: Result assembly
+            # =========================================================
+            # Build a fast lookup table: frame_idx -> list of hand results
+            frame_to_hands = [[] for _ in range(len(processed_imgs))]
+            if wilor_out_all:
+                for idx, meta in enumerate(hand_meta_map):
+                    hp_aa = self._matrix_to_axis_angle(wilor_out_all['pred_mano_params']['hand_pose'][idx])
+                    frame_to_hands[meta['batch_idx']].append({
+                        'is_right': meta['is_right'],
+                        'hp_aa': hp_aa
+                    })
+            if batch_out_smpl is not None:
+                batch_out_smpl = {k: v.detach().to(torch.float32).cpu() if isinstance(v, torch.Tensor) else v
+                                 for k, v in batch_out_smpl.items()}
+            if wilor_out_all is not None:
+                wilor_out_all = {k: v.detach().to(torch.float32).cpu() if isinstance(v, torch.Tensor) else v
+                                for k, v in wilor_out_all.items()}
+            for i in range(len(processed_imgs)):
+                img = processed_imgs[i]
+                if not body_results[i].boxes:
+                    batch_results_data.append(None); continue
+                idx = original_to_agg_idx[i]
+                res_base = {
+                    'body_pose': batch_out_smpl['smplx_body_pose'][idx].numpy().flatten()[None],
+                    'root_pose': batch_out_smpl['smplx_root_pose'][idx].numpy().flatten()[None],
+                    'shape': batch_out_smpl['smplx_shape'][idx].numpy().flatten()[None],
+                    'expr': batch_out_smpl['smplx_expr'][idx].numpy().flatten()[None],
+                    'trans': (batch_out_smpl['smplx_trans'][idx] if 'smplx_trans' in batch_out_smpl
+                              else batch_out_smpl['cam_trans'][idx]).numpy().flatten()[None],
+                    'jaw_pose': batch_out_smpl['smplx_jaw_pose'][idx].numpy().flatten()[None]
+                                if 'smplx_jaw_pose' in batch_out_smpl else np.zeros((1, 3)),
+                }
+                pred_l = batch_out_smpl['smplx_lhand_pose'][idx].numpy().flatten()
+                pred_r = batch_out_smpl['smplx_rhand_pose'][idx].numpy().flatten()
+                lhand_p, rhand_p = None, None
+                for h_res in frame_to_hands[i]:
+                    if h_res['is_right'] == 1:
+                        rhand_p = h_res['hp_aa'] - self.smpl_mean_r
+                    else:
+                        lf = h_res['hp_aa'].reshape(-1, 3).copy(); lf[:, 1:3] *= -1
+                        lhand_p = lf.flatten() - self.smpl_mean_l
+                res_base['lhand_pose'] = (lhand_p if lhand_p is not None else pred_l)[None]
+                res_base['rhand_pose'] = (rhand_p if rhand_p is not None else pred_r)[None]
+                batch_results_data.append(res_base)
+        return batch_results_data
+    def process_single_item(self, input_path, output_root_dir):
+        item_name = os.path.splitext(os.path.basename(input_path.rstrip(os.sep)))[0]
+        final_pkl_path = os.path.join(output_root_dir, f"{item_name}.pkl")
+        def update_pbar():
+            if self.args.progress == 'overall' and self.global_pbar:
+                elapsed_min = (time.time() - self.start_time) / 60
+                self.global_pbar.set_description(f"R{self.accelerator.process_index} | Done:{self.files_done}/{self.my_total} | {elapsed_min:.1f}m")
+                self.global_pbar.update(1)
+        if os.path.exists(final_pkl_path):
+            self.files_done += 1; update_pbar(); return
+        device_id = self.device.index if self.device.index is not None else 0
+        try:
+            prefetcher = FramePrefetcher(input_path, device_id=device_id).start()
+        except RuntimeError as e:
+            print(f"Skip broken vidoe: {input_path}")
+            with open("corrupted_videos.log", "a") as f:
+                f.write(f"{input_path}\n")
+            return
+        total_frames = prefetcher.total_frames
+        pbar = tqdm(total=total_frames, desc=f"R{self.accelerator.process_index} | {item_name}",
+                    disable=(not self.accelerator.is_main_process or self.args.progress == 'overall'))
+        all_frames_data = []
+        while prefetcher.is_running():
+            batch = prefetcher.get_batch(self.args.batch_size)
+            if not batch: break
+            batch_res = self._process_batch(batch)
+            for res in batch_res:
+                if res:
+                    all_frames_data.append(np.concatenate([
+                        res['root_pose'].flatten(), res['body_pose'].flatten(),
+                        res['lhand_pose'].flatten(), res['rhand_pose'].flatten(),
+                        res['jaw_pose'].flatten(), res['shape'].flatten(), res['expr'].flatten()
+                    ]))
+                if pbar: pbar.update(1)
+            # torch.cuda.empty_cache()
+        if all_frames_data:
+            with open(final_pkl_path, 'wb') as f:
+                pickle.dump(np.stack(all_frames_data, axis=0).astype(np.float32), f)
+        prefetcher.stop(); pbar.close()
+        self.files_done += 1; update_pbar()
+def main(args):
+    accelerator = Accelerator(); set_seed(42)
+    input_items = sorted([os.path.join(args.input_path, f) for f in os.listdir(args.input_path) if f.endswith(('.mp4', '.mov', '.avi'))])
+    to_process = [it for it in input_items if not os.path.exists(os.path.join(args.output_path, f"{os.path.splitext(os.path.basename(it))[0]}.pkl"))]
+    if accelerator.is_main_process:
+        if not os.path.exists(args.output_path):
+            os.makedirs(args.output_path, exist_ok=True)
+            print(f"Created output directory: {args.output_path}")
+        print("\n" + "="*50)
+        print("Distributed Accelerator:")
+        print(f"  - num_processes (GPU):   {accelerator.num_processes}")
+        print(f"  - mixed_precision: {accelerator.mixed_precision} (Native BF16)")
+        print(f"  - TF32 Precision:  Enabled (Hopper Optimized)")
+        print(f"  - Apply SR:        {args.apply_sr}")
+        print("="*50 + "\n")
+        print("="*50 + "\n")
+        print(f"Dataset Status: Total={len(input_items)} | To Process={len(to_process)} | Done={len(input_items)-len(to_process)}")
+        print(f"Batch Settings: Size={args.batch_size} | Mode={args.progress}")
+        print(f"Num workers: Size={args.num_workers}")
+        print("-" * 50 + "\n")
+    with accelerator.split_between_processes(to_process) as my_items:
+        extractor = SMPLXPoseExtractor(args, accelerator)
+        extractor.my_total = len(my_items); extractor.start_time = time.time()
+        if args.progress == 'overall' and accelerator.is_main_process:
+            extractor.global_pbar = tqdm(total=len(my_items), position=0)
+        for item in my_items:
+            extractor.process_single_item(item, args.output_path)
+        if extractor.global_pbar: extractor.global_pbar.close()
+    accelerator.wait_for_everyone()
+    if accelerator.is_main_process:
+        print("\n" + "="*50)
+        print("All Processing Done! Calculating final statistics (Incremental Mode)...")
+        pkl_files = [f for f in os.listdir(args.output_path)
+                     if f.endswith('.pkl') and os.path.isfile(os.path.join(args.output_path, f))
+                     and not any(x in f.lower() for x in ['wilor', 'smplest', 'final'])]
+        count = 0
+        sum_x = np.zeros(179, dtype=np.float64)
+        sum_x2 = np.zeros(179, dtype=np.float64)
+        for p in tqdm(pkl_files, desc="Processing Statistics"):
+            pkl_path = os.path.join(args.output_path, p)
+            try:
+                with open(pkl_path, 'rb') as f:
+                    data = pickle.load(f)
+                    if isinstance(data, np.ndarray) and data.shape[-1] == 179:
+                        sum_x += np.sum(data, axis=0)
+                        sum_x2 += np.sum(data**2, axis=0)
+                        count += data.shape[0]
+                    if count == 0:
+                        print(f"\nDebug: File={p}, Type={type(data)}, Shape={getattr(data, 'shape', 'No Shape')}")
+            except Exception as e:
+                print(f"Warning: Could not load {pkl_path}, error: {e}")
+        if count > 0:
+            mean = sum_x / count
+            var = (sum_x2 / count) - (mean ** 2)
+            std = np.sqrt(np.maximum(var, 1e-8))
+            stats_dir = os.path.join(args.output_path, 'stats')
+            os.makedirs(stats_dir, exist_ok=True)
+            dataset_name = args.dataset if hasattr(args, 'dataset') else "csl_news"
+            torch.save(torch.from_numpy(mean.astype(np.float32)), os.path.join(stats_dir, f"{dataset_name.lower()}_mean.pt"))
+            torch.save(torch.from_numpy(std.astype(np.float32)), os.path.join(stats_dir, f"{dataset_name.lower()}_std.pt"))
+            print(f"Success: Processed {count} frames from {len(pkl_files)} files.")
+            print(f"Statistics saved in: {stats_dir}")
+        else:
+            print("Error: No valid pose data found for statistics.")
+    if torch.distributed.is_initialized():
+        torch.distributed.destroy_process_group()
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--input_path', type=str, required=True)
+    parser.add_argument('--output_path', type=str, required=True)
+    parser.add_argument('--progress', type=str, choices=['each', 'overall'], default='overall')
+    parser.add_argument('--num_workers', type=int)
+    parser.add_argument('--batch_size', type=int)
+    parser.add_argument('--apply_sr', action='store_true', default=False)
+    parser.add_argument('--opt_depth', action='store_true', default=False)
+    parser.add_argument('--yolo_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/yolo/yolo26l.pt')
+    parser.add_argument('--hand_detector_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/wilor/detector.pt')
+    parser.add_argument('--wilor_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/wilor/wilor_final.ckpt')
+    parser.add_argument('--real_esrgan_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/realesrgan/RealESRGAN_x4plus.pth')
+    parser.add_argument('--depth_anything_v2_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/depth_anything-v2/depth_anything_v2_vitl.pth')
+    main(parser.parse_args())

extract_smplx_pose.sh ADDED Viewed

	@@ -0,0 +1,27 @@

+#!/bin/bash
+export TZ=Asia/Shanghai
+LOG_TIME=$(date +%Y%m%d_%H%M%S)
+CUR_DIR="/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction"
+cd $CUR_DIR
+nohup stdbuf -oL bash -c "while true; do date; nvidia-smi; sleep 10; done" \
+    > "${CUR_DIR}/extract_smplx_${LOG_TIME}_gpu_monitor.log" 2>&1 &
+PY310_BIN="/mnt/shared-storage-user/mllm/zangyuhang/pmx/envs/py310/bin/python3.10"
+export PYTHONPATH="${CUR_DIR}:${CUR_DIR}/SMPLest-X:${CUR_DIR}/WiLoR:${CUR_DIR}/Depth-Anything-V2:$PYTHONPATH"
+export OPENCV_FOR_THREADS_NUM=8
+$PY310_BIN -u -m accelerate.commands.launch \
+    --same_network \
+    --num_processes 4 \
+    --num_machines 1 \
+    --mixed_precision bf16 \
+    --dynamo_backend no \
+    --num_cpu_threads_per_process 24 \
+    extract_smplx_pose.py \
+    --batch_size 147456 \
+    --num_workers 64 \
+    --input_path "/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_Daily/rgb_format/videos_512x512_30fps" \
+    --output_path "/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLGDatasets/CSL_News/new_merged_poses"

log/extract_smplx_20260211_195012.log ADDED Viewed

The diff for this file is too large to render. See raw diff

log/extract_smplx_20260212_034356.log ADDED Viewed

The diff for this file is too large to render. See raw diff

pretrained_weight/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file