Nebularer commited on
Commit
8af32c5
·
verified ·
1 Parent(s): c7b32c7

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. Depth-Anything-V2/DA-2K.md +51 -0
  2. Depth-Anything-V2/LICENSE +201 -0
  3. Depth-Anything-V2/README.md +201 -0
  4. Depth-Anything-V2/app.py +88 -0
  5. Depth-Anything-V2/requirements.txt +9 -0
  6. Depth-Anything-V2/run.py +73 -0
  7. Depth-Anything-V2/run_video.py +92 -0
  8. SMPLest-X/.DS_Store +0 -0
  9. SMPLest-X/.gitignore +8 -0
  10. SMPLest-X/LICENSE.txt +9 -0
  11. SMPLest-X/README.md +152 -0
  12. SMPLest-X/datasets/SynHand.py +39 -0
  13. SMPLest-X/datasets/dataset.py +103 -0
  14. SMPLest-X/datasets/humandata.py +1076 -0
  15. SMPLest-X/humandata_prep/README.md +64 -0
  16. SMPLest-X/humandata_prep/check.py +298 -0
  17. SMPLest-X/main/__init__.py +0 -0
  18. SMPLest-X/main/base.py +234 -0
  19. SMPLest-X/main/config.py +101 -0
  20. SMPLest-X/main/constants.py +37 -0
  21. SMPLest-X/main/inference.py +188 -0
  22. SMPLest-X/main/test.py +107 -0
  23. SMPLest-X/main/train.py +138 -0
  24. SMPLest-X/requirements.txt +13 -0
  25. SMPLest-X/requirements_py310.txt +14 -0
  26. SMPLest-X/utils/distribute_utils.py +171 -0
  27. SMPLest-X/utils/timer.py +31 -0
  28. SMPLest-X/utils/transforms.py +366 -0
  29. WiLoR/.DS_Store +0 -0
  30. WiLoR/README.md +93 -0
  31. WiLoR/demo.py +139 -0
  32. WiLoR/demo.sh +2 -0
  33. WiLoR/download_videos.py +58 -0
  34. WiLoR/gradio_demo.py +192 -0
  35. WiLoR/license.txt +402 -0
  36. WiLoR/requirements.txt +20 -0
  37. WiLoR/requirements_my.txt +11 -0
  38. __init__.py +11 -0
  39. convert_img_to_videos.py +90 -0
  40. corrupted_videos.log +7 -0
  41. corrupted_videos_csl_news.log +7 -0
  42. extract_smplx_20260212_165824.log +0 -0
  43. extract_smplx_20260212_165911_gpu_monitor.log +0 -0
  44. extract_smplx_20260213_144424.log +0 -0
  45. extract_smplx_20260213_144424_gpu_monitor.log +0 -0
  46. extract_smplx_pose.py +657 -0
  47. extract_smplx_pose.sh +27 -0
  48. log/extract_smplx_20260211_195012.log +0 -0
  49. log/extract_smplx_20260212_034356.log +0 -0
  50. pretrained_weight/.DS_Store +0 -0
Depth-Anything-V2/DA-2K.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DA-2K Evaluation Benchmark
2
+
3
+ ## Introduction
4
+
5
+ ![DA-2K](assets/DA-2K.png)
6
+
7
+ DA-2K is proposed in [Depth Anything V2](https://depth-anything-v2.github.io) to evaluate the relative depth estimation capability. It encompasses eight representative scenarios of `indoor`, `outdoor`, `non_real`, `transparent_reflective`, `adverse_style`, `aerial`, `underwater`, and `object`. It consists of 1K diverse high-quality images and 2K precise pair-wise relative depth annotations.
8
+
9
+ Please refer to our [paper](https://arxiv.org/abs/2406.09414) for details in constructing this benchmark.
10
+
11
+
12
+ ## Usage
13
+
14
+ Please first [download the benchmark](https://huggingface.co/datasets/depth-anything/DA-2K/tree/main).
15
+
16
+ All annotations are stored in `annotations.json`. The annotation file is a JSON object where each key is the path to an image file, and the value is a list of annotations associated with that image. Each annotation describes two points and identifies which point is closer to the camera. The structure is detailed below:
17
+
18
+ ```
19
+ {
20
+ "image_path": [
21
+ {
22
+ "point1": [h1, w1], # (vertical position, horizontal position)
23
+ "point2": [h2, w2], # (vertical position, horizontal position)
24
+ "closer_point": "point1" # we always set "point1" as the closer one
25
+ },
26
+ ...
27
+ ],
28
+ ...
29
+ }
30
+ ```
31
+
32
+ To visualize the annotations:
33
+ ```bash
34
+ python visualize.py [--scene-type <type>]
35
+ ```
36
+
37
+ **Options**
38
+ - `--scene-type <type>` (optional): Specify the scene type (`indoor`, `outdoor`, `non_real`, `transparent_reflective`, `adverse_style`, `aerial`, `underwater`, and `object`). Skip this argument or set <type> as `""` to include all scene types.
39
+
40
+ ## Citation
41
+
42
+ If you find this benchmark useful, please consider citing:
43
+
44
+ ```bibtex
45
+ @article{depth_anything_v2,
46
+ title={Depth Anything V2},
47
+ author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
48
+ journal={arXiv:2406.09414},
49
+ year={2024}
50
+ }
51
+ ```
Depth-Anything-V2/LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
Depth-Anything-V2/README.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <h1>Depth Anything V2</h1>
3
+
4
+ [**Lihe Yang**](https://liheyoung.github.io/)<sup>1</sup> · [**Bingyi Kang**](https://bingykang.github.io/)<sup>2&dagger;</sup> · [**Zilong Huang**](http://speedinghzl.github.io/)<sup>2</sup>
5
+ <br>
6
+ [**Zhen Zhao**](http://zhaozhen.me/) · [**Xiaogang Xu**](https://xiaogang00.github.io/) · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)<sup>2</sup> · [**Hengshuang Zhao**](https://hszhao.github.io/)<sup>1*</sup>
7
+
8
+ <sup>1</sup>HKU&emsp;&emsp;&emsp;<sup>2</sup>TikTok
9
+ <br>
10
+ &dagger;project lead&emsp;*corresponding author
11
+
12
+ <a href="https://arxiv.org/abs/2406.09414"><img src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'></a>
13
+ <a href='https://depth-anything-v2.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'></a>
14
+ <a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a>
15
+ <a href='https://huggingface.co/datasets/depth-anything/DA-2K'><img src='https://img.shields.io/badge/Benchmark-DA--2K-yellow' alt='Benchmark'></a>
16
+ </div>
17
+
18
+ This work presents Depth Anything V2. It significantly outperforms [V1](https://github.com/LiheYoung/Depth-Anything) in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy.
19
+
20
+ ![teaser](assets/teaser.png)
21
+
22
+
23
+ ## News
24
+ - **2025-01-22:** [Video Depth Anything](https://videodepthanything.github.io) has been released. It generates consistent depth maps for super-long videos (e.g., over 5 minutes).
25
+ - **2024-12-22:** [Prompt Depth Anything](https://promptda.github.io/) has been released. It supports 4K resolution metric depth estimation when low-res LiDAR is used to prompt the DA models.
26
+ - **2024-07-06:** Depth Anything V2 is supported in [Transformers](https://github.com/huggingface/transformers/). See the [instructions](https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2) for convenient usage.
27
+ - **2024-06-25:** Depth Anything is integrated into [Apple Core ML Models](https://developer.apple.com/machine-learning/models/). See the instructions ([V1](https://huggingface.co/apple/coreml-depth-anything-small), [V2](https://huggingface.co/apple/coreml-depth-anything-v2-small)) for usage.
28
+ - **2024-06-22:** We release [smaller metric depth models](https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth#pre-trained-models) based on Depth-Anything-V2-Small and Base.
29
+ - **2024-06-20:** Our repository and project page are flagged by GitHub and removed from the public for 6 days. Sorry for the inconvenience.
30
+ - **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released.
31
+
32
+
33
+ ## Pre-trained Models
34
+
35
+ We provide **four models** of varying scales for robust relative depth estimation:
36
+
37
+ | Model | Params | Checkpoint |
38
+ |:-|-:|:-:|
39
+ | Depth-Anything-V2-Small | 24.8M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) |
40
+ | Depth-Anything-V2-Base | 97.5M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) |
41
+ | Depth-Anything-V2-Large | 335.3M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) |
42
+ | Depth-Anything-V2-Giant | 1.3B | Coming soon |
43
+
44
+
45
+ ## Usage
46
+
47
+ ### Prepraration
48
+
49
+ ```bash
50
+ git clone https://github.com/DepthAnything/Depth-Anything-V2
51
+ cd Depth-Anything-V2
52
+ pip install -r requirements.txt
53
+ ```
54
+
55
+ Download the checkpoints listed [here](#pre-trained-models) and put them under the `checkpoints` directory.
56
+
57
+ ### Use our models
58
+ ```python
59
+ import cv2
60
+ import torch
61
+
62
+ from depth_anything_v2.dpt import DepthAnythingV2
63
+
64
+ DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
65
+
66
+ model_configs = {
67
+ 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
68
+ 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
69
+ 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
70
+ 'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
71
+ }
72
+
73
+ encoder = 'vitl' # or 'vits', 'vitb', 'vitg'
74
+
75
+ model = DepthAnythingV2(**model_configs[encoder])
76
+ model.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location='cpu'))
77
+ model = model.to(DEVICE).eval()
78
+
79
+ raw_img = cv2.imread('your/image/path')
80
+ depth = model.infer_image(raw_img) # HxW raw depth map in numpy
81
+ ```
82
+
83
+ If you do not want to clone this repository, you can also load our models through [Transformers](https://github.com/huggingface/transformers/). Below is a simple code snippet. Please refer to the [official page](https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2) for more details.
84
+
85
+ - Note 1: Make sure you can connect to Hugging Face and have installed the latest Transformers.
86
+ - Note 2: Due to the [upsampling difference](https://github.com/huggingface/transformers/pull/31522#issuecomment-2184123463) between OpenCV (we used) and Pillow (HF used), predictions may differ slightly. So you are more recommended to use our models through the way introduced above.
87
+ ```python
88
+ from transformers import pipeline
89
+ from PIL import Image
90
+
91
+ pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Small-hf")
92
+ image = Image.open('your/image/path')
93
+ depth = pipe(image)["depth"]
94
+ ```
95
+
96
+ ### Running script on *images*
97
+
98
+ ```bash
99
+ python run.py \
100
+ --encoder <vits | vitb | vitl | vitg> \
101
+ --img-path <path> --outdir <outdir> \
102
+ [--input-size <size>] [--pred-only] [--grayscale]
103
+ ```
104
+ Options:
105
+ - `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.
106
+ - `--input-size` (optional): By default, we use input size `518` for model inference. ***You can increase the size for even more fine-grained results.***
107
+ - `--pred-only` (optional): Only save the predicted depth map, without raw image.
108
+ - `--grayscale` (optional): Save the grayscale depth map, without applying color palette.
109
+
110
+ For example:
111
+ ```bash
112
+ python run.py --encoder vitl --img-path assets/examples --outdir depth_vis
113
+ ```
114
+
115
+ ### Running script on *videos*
116
+
117
+ ```bash
118
+ python run_video.py \
119
+ --encoder <vits | vitb | vitl | vitg> \
120
+ --video-path assets/examples_video --outdir video_depth_vis \
121
+ [--input-size <size>] [--pred-only] [--grayscale]
122
+ ```
123
+
124
+ ***Our larger model has better temporal consistency on videos.***
125
+
126
+ ### Gradio demo
127
+
128
+ To use our gradio demo locally:
129
+
130
+ ```bash
131
+ python app.py
132
+ ```
133
+
134
+ You can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2).
135
+
136
+ ***Note: Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)).*** In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use [intermediate features](https://github.com/DepthAnything/Depth-Anything-V2/blob/2cbc36a8ce2cec41d38ee51153f112e87c8e42d8/depth_anything_v2/dpt.py#L164-L169) instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.
137
+
138
+
139
+ ## Fine-tuned to Metric Depth Estimation
140
+
141
+ Please refer to [metric depth estimation](./metric_depth).
142
+
143
+
144
+ ## DA-2K Evaluation Benchmark
145
+
146
+ Please refer to [DA-2K benchmark](./DA-2K.md).
147
+
148
+
149
+ ## Community Support
150
+
151
+ **We sincerely appreciate all the community support for our Depth Anything series. Thank you a lot!**
152
+
153
+ - Apple Core ML:
154
+ - https://developer.apple.com/machine-learning/models
155
+ - https://huggingface.co/apple/coreml-depth-anything-v2-small
156
+ - https://huggingface.co/apple/coreml-depth-anything-small
157
+ - Transformers:
158
+ - https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2
159
+ - https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything
160
+ - TensorRT:
161
+ - https://github.com/spacewalk01/depth-anything-tensorrt
162
+ - https://github.com/zhujiajian98/Depth-Anythingv2-TensorRT-python
163
+ - ONNX: https://github.com/fabio-sim/Depth-Anything-ONNX
164
+ - ComfyUI: https://github.com/kijai/ComfyUI-DepthAnythingV2
165
+ - Transformers.js (real-time depth in web): https://huggingface.co/spaces/Xenova/webgpu-realtime-depth-estimation
166
+ - Android:
167
+ - https://github.com/shubham0204/Depth-Anything-Android
168
+ - https://github.com/FeiGeChuanShu/ncnn-android-depth_anything
169
+
170
+
171
+ ## Acknowledgement
172
+
173
+ We are sincerely grateful to the awesome Hugging Face team ([@Pedro Cuenca](https://huggingface.co/pcuenq), [@Niels Rogge](https://huggingface.co/nielsr), [@Merve Noyan](https://huggingface.co/merve), [@Amy Roberts](https://huggingface.co/amyeroberts), et al.) for their huge efforts in supporting our models in Transformers and Apple Core ML.
174
+
175
+ We also thank the [DINOv2](https://github.com/facebookresearch/dinov2) team for contributing such impressive models to our community.
176
+
177
+
178
+ ## LICENSE
179
+
180
+ Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.
181
+
182
+
183
+ ## Citation
184
+
185
+ If you find this project useful, please consider citing:
186
+
187
+ ```bibtex
188
+ @article{depth_anything_v2,
189
+ title={Depth Anything V2},
190
+ author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
191
+ journal={arXiv:2406.09414},
192
+ year={2024}
193
+ }
194
+
195
+ @inproceedings{depth_anything_v1,
196
+ title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
197
+ author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
198
+ booktitle={CVPR},
199
+ year={2024}
200
+ }
201
+ ```
Depth-Anything-V2/app.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import glob
2
+ import gradio as gr
3
+ import matplotlib
4
+ import numpy as np
5
+ from PIL import Image
6
+ import torch
7
+ import tempfile
8
+ from gradio_imageslider import ImageSlider
9
+
10
+ from depth_anything_v2.dpt import DepthAnythingV2
11
+
12
+ css = """
13
+ #img-display-container {
14
+ max-height: 100vh;
15
+ }
16
+ #img-display-input {
17
+ max-height: 80vh;
18
+ }
19
+ #img-display-output {
20
+ max-height: 80vh;
21
+ }
22
+ #download {
23
+ height: 62px;
24
+ }
25
+ """
26
+ DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
27
+ model_configs = {
28
+ 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
29
+ 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
30
+ 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
31
+ 'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
32
+ }
33
+ encoder = 'vitl'
34
+ model = DepthAnythingV2(**model_configs[encoder])
35
+ state_dict = torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location="cpu")
36
+ model.load_state_dict(state_dict)
37
+ model = model.to(DEVICE).eval()
38
+
39
+ title = "# Depth Anything V2"
40
+ description = """Official demo for **Depth Anything V2**.
41
+ Please refer to our [paper](https://arxiv.org/abs/2406.09414), [project page](https://depth-anything-v2.github.io), or [github](https://github.com/DepthAnything/Depth-Anything-V2) for more details."""
42
+
43
+ def predict_depth(image):
44
+ return model.infer_image(image)
45
+
46
+ with gr.Blocks(css=css) as demo:
47
+ gr.Markdown(title)
48
+ gr.Markdown(description)
49
+ gr.Markdown("### Depth Prediction demo")
50
+
51
+ with gr.Row():
52
+ input_image = gr.Image(label="Input Image", type='numpy', elem_id='img-display-input')
53
+ depth_image_slider = ImageSlider(label="Depth Map with Slider View", elem_id='img-display-output', position=0.5)
54
+ submit = gr.Button(value="Compute Depth")
55
+ gray_depth_file = gr.File(label="Grayscale depth map", elem_id="download",)
56
+ raw_file = gr.File(label="16-bit raw output (can be considered as disparity)", elem_id="download",)
57
+
58
+ cmap = matplotlib.colormaps.get_cmap('Spectral_r')
59
+
60
+ def on_submit(image):
61
+ original_image = image.copy()
62
+
63
+ h, w = image.shape[:2]
64
+
65
+ depth = predict_depth(image[:, :, ::-1])
66
+
67
+ raw_depth = Image.fromarray(depth.astype('uint16'))
68
+ tmp_raw_depth = tempfile.NamedTemporaryFile(suffix='.png', delete=False)
69
+ raw_depth.save(tmp_raw_depth.name)
70
+
71
+ depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
72
+ depth = depth.astype(np.uint8)
73
+ colored_depth = (cmap(depth)[:, :, :3] * 255).astype(np.uint8)
74
+
75
+ gray_depth = Image.fromarray(depth)
76
+ tmp_gray_depth = tempfile.NamedTemporaryFile(suffix='.png', delete=False)
77
+ gray_depth.save(tmp_gray_depth.name)
78
+
79
+ return [(original_image, colored_depth), tmp_gray_depth.name, tmp_raw_depth.name]
80
+
81
+ submit.click(on_submit, inputs=[input_image], outputs=[depth_image_slider, gray_depth_file, raw_file])
82
+
83
+ example_files = glob.glob('assets/examples/*')
84
+ examples = gr.Examples(examples=example_files, inputs=[input_image], outputs=[depth_image_slider, gray_depth_file, raw_file], fn=on_submit)
85
+
86
+
87
+ if __name__ == '__main__':
88
+ demo.queue().launch()
Depth-Anything-V2/requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio_imageslider
2
+ gradio==4.29.0
3
+ matplotlib
4
+ opencv-python
5
+ torch
6
+ torchvision
7
+
8
+
9
+
Depth-Anything-V2/run.py ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import cv2
3
+ import glob
4
+ import matplotlib
5
+ import numpy as np
6
+ import os
7
+ import torch
8
+
9
+ from depth_anything_v2.dpt import DepthAnythingV2
10
+
11
+
12
+ if __name__ == '__main__':
13
+ parser = argparse.ArgumentParser(description='Depth Anything V2')
14
+
15
+ parser.add_argument('--img-path', type=str, default=r"D:\SMPL-X_pose_extraction\demo\inputs\000049.jpg")
16
+ parser.add_argument('--input-size', type=int, default=256)
17
+ parser.add_argument('--outdir', type=str, default='./demo')
18
+
19
+ parser.add_argument('--encoder', type=str, default='vitl', choices=['vits', 'vitb', 'vitl', 'vitg'])
20
+
21
+ parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='only display the prediction')
22
+ parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='do not apply colorful palette')
23
+
24
+ args = parser.parse_args()
25
+
26
+ DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
27
+
28
+ model_configs = {
29
+ 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
30
+ 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
31
+ 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
32
+ 'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
33
+ }
34
+
35
+ depth_anything = DepthAnythingV2(**model_configs[args.encoder])
36
+ depth_anything.load_state_dict(torch.load(f'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight\depth_anything-v2\depth_anything_v2_{args.encoder}.pth', map_location='cpu'))
37
+ depth_anything = depth_anything.to(DEVICE).eval()
38
+
39
+ if os.path.isfile(args.img_path):
40
+ if args.img_path.endswith('txt'):
41
+ with open(args.img_path, 'r') as f:
42
+ filenames = f.read().splitlines()
43
+ else:
44
+ filenames = [args.img_path]
45
+ else:
46
+ filenames = glob.glob(os.path.join(args.img_path, '**/*'), recursive=True)
47
+
48
+ os.makedirs(args.outdir, exist_ok=True)
49
+
50
+ cmap = matplotlib.colormaps.get_cmap('Spectral_r')
51
+
52
+ for k, filename in enumerate(filenames):
53
+ print(f'Progress {k+1}/{len(filenames)}: {filename}')
54
+
55
+ raw_image = cv2.imread(filename)
56
+
57
+ depth = depth_anything.infer_image(raw_image, args.input_size)
58
+
59
+ depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
60
+ depth = depth.astype(np.uint8)
61
+
62
+ if args.grayscale:
63
+ depth = np.repeat(depth[..., np.newaxis], 3, axis=-1)
64
+ else:
65
+ depth = (cmap(depth)[:, :, :3] * 255)[:, :, ::-1].astype(np.uint8)
66
+
67
+ if args.pred_only:
68
+ cv2.imwrite(os.path.join(args.outdir, os.path.splitext(os.path.basename(filename))[0] + '.png'), depth)
69
+ else:
70
+ split_region = np.ones((raw_image.shape[0], 50, 3), dtype=np.uint8) * 255
71
+ combined_result = cv2.hconcat([raw_image, split_region, depth])
72
+
73
+ cv2.imwrite(os.path.join(args.outdir, os.path.splitext(os.path.basename(filename))[0] + '.png'), combined_result)
Depth-Anything-V2/run_video.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import cv2
3
+ import glob
4
+ import matplotlib
5
+ import numpy as np
6
+ import os
7
+ import torch
8
+
9
+ from depth_anything_v2.dpt import DepthAnythingV2
10
+
11
+
12
+ if __name__ == '__main__':
13
+ parser = argparse.ArgumentParser(description='Depth Anything V2')
14
+
15
+ parser.add_argument('--video-path', type=str)
16
+ parser.add_argument('--input-size', type=int, default=518)
17
+ parser.add_argument('--outdir', type=str, default='./vis_video_depth')
18
+
19
+ parser.add_argument('--encoder', type=str, default='vitl', choices=['vits', 'vitb', 'vitl', 'vitg'])
20
+
21
+ parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='only display the prediction')
22
+ parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='do not apply colorful palette')
23
+
24
+ args = parser.parse_args()
25
+
26
+ DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
27
+
28
+ model_configs = {
29
+ 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
30
+ 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
31
+ 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
32
+ 'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
33
+ }
34
+
35
+ depth_anything = DepthAnythingV2(**model_configs[args.encoder])
36
+ depth_anything.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{args.encoder}.pth', map_location='cpu'))
37
+ depth_anything = depth_anything.to(DEVICE).eval()
38
+
39
+ if os.path.isfile(args.video_path):
40
+ if args.video_path.endswith('txt'):
41
+ with open(args.video_path, 'r') as f:
42
+ lines = f.read().splitlines()
43
+ else:
44
+ filenames = [args.video_path]
45
+ else:
46
+ filenames = glob.glob(os.path.join(args.video_path, '**/*'), recursive=True)
47
+
48
+ os.makedirs(args.outdir, exist_ok=True)
49
+
50
+ margin_width = 50
51
+ cmap = matplotlib.colormaps.get_cmap('Spectral_r')
52
+
53
+ for k, filename in enumerate(filenames):
54
+ print(f'Progress {k+1}/{len(filenames)}: {filename}')
55
+
56
+ raw_video = cv2.VideoCapture(filename)
57
+ frame_width, frame_height = int(raw_video.get(cv2.CAP_PROP_FRAME_WIDTH)), int(raw_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
58
+ frame_rate = int(raw_video.get(cv2.CAP_PROP_FPS))
59
+
60
+ if args.pred_only:
61
+ output_width = frame_width
62
+ else:
63
+ output_width = frame_width * 2 + margin_width
64
+
65
+ output_path = os.path.join(args.outdir, os.path.splitext(os.path.basename(filename))[0] + '.mp4')
66
+ out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*"mp4v"), frame_rate, (output_width, frame_height))
67
+
68
+ while raw_video.isOpened():
69
+ ret, raw_frame = raw_video.read()
70
+ if not ret:
71
+ break
72
+
73
+ depth = depth_anything.infer_image(raw_frame, args.input_size)
74
+
75
+ depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
76
+ depth = depth.astype(np.uint8)
77
+
78
+ if args.grayscale:
79
+ depth = np.repeat(depth[..., np.newaxis], 3, axis=-1)
80
+ else:
81
+ depth = (cmap(depth)[:, :, :3] * 255)[:, :, ::-1].astype(np.uint8)
82
+
83
+ if args.pred_only:
84
+ out.write(depth)
85
+ else:
86
+ split_region = np.ones((frame_height, margin_width, 3), dtype=np.uint8) * 255
87
+ combined_frame = cv2.hconcat([raw_frame, split_region, depth])
88
+
89
+ out.write(combined_frame)
90
+
91
+ raw_video.release()
92
+ out.release()
SMPLest-X/.DS_Store ADDED
Binary file (6.15 kB). View file
 
SMPLest-X/.gitignore ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ data
2
+ outputs
3
+ pretrained_models
4
+ demo
5
+ *.pyc
6
+ **/__pycache__
7
+ **/.DS_Store
8
+ **/human_model_files
SMPLest-X/LICENSE.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ S-Lab License 1.0
2
+
3
+ Copyright 2022 S-Lab
4
+ Redistribution and use for non-commercial purpose in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
5
+ 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
6
+ 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
7
+ 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
8
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
9
+ 4. In the event that redistribution and/or use for commercial purpose in source or binary forms, with or without modification is required, please contact the contributor(s) of the work.
SMPLest-X/README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation
2
+
3
+ This work is the extended version of [SMPLer-X](https://arxiv.org/abs/2309.17448). This new codebase is designed for easy installation and flexible development, enabling seamless integration of new methods with the pretrained SMPLest-X model.
4
+
5
+ ![Teaser](./assets/teaser.png)
6
+
7
+
8
+ ## Useful links
9
+
10
+ <div align="center">
11
+ <a href="https://arxiv.org/abs/2501.09782" class="button"><b>[arXiv]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
12
+ <a href="https://caizhongang.github.io/projects/SMPLer-X/" class="button"><b>[Homepage]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
13
+ <a href="https://youtu.be/DepTqbPpVzY" class="button"><b>[Video]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
14
+ <a href="https://github.com/caizhongang/SMPLer-X" class="button"><b>[SMPLer-X]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
15
+ <a href="https://github.com/open-mmlab/mmhuman3d" class="button"><b>[MMHuman3D]</b></a> &nbsp;&nbsp;&nbsp;&nbsp;
16
+ <a href="https://github.com/wqyin/WHAC/tree/main" class="button"><b>[WHAC]</b></a></a>
17
+
18
+ </div>
19
+
20
+
21
+ ## News
22
+
23
+ - [2025-10-21] SMPLest-X accepted to TPAMI.
24
+ - [2025-02-17] Pretrained model available for download.
25
+ - [2025-02-14] 💌💌💌 Brand new codebase released for training, testing and inference.
26
+ - [2025-01-20] Paper released on [arXiv](https://arxiv.org/abs/2501.09782).
27
+ - [2025-01-08] Project page created.
28
+
29
+
30
+ ## Install
31
+ ```bash
32
+ bash scripts/install.sh
33
+ ```
34
+
35
+ ## Preparation
36
+
37
+ #### SMPLest-X pretrained models
38
+ - Download the pretrained **SMPLest-X-Huge model** weight from [here](https://huggingface.co/waanqii/SMPLest-X/tree/main) (8.2G).
39
+ - Place the pretrained weight and respective config file according to the file structure.
40
+
41
+ #### Parametric human models
42
+ - Download [SMPL-X](https://smpl-x.is.tue.mpg.de/) and [SMPL](https://smpl.is.tue.mpg.de/) body models.
43
+
44
+ #### ViT-Pose pretrained models (For training only)
45
+ - Follow [OSX](https://github.com/IDEA-Research/OSX) in preparing pretrained ViTPose models. Download the ViTPose pretrained weights from [here](https://github.com/ViTAE-Transformer/ViTPose).
46
+
47
+ #### HumanData
48
+ - Please refer to [this guide](humandata_prep/README.md) for instructions on preparing the data in the HumanData format.
49
+
50
+ The final file structure should be like:
51
+ ```
52
+ .
53
+ ├── assets
54
+ ├── configs
55
+ ├── data
56
+ │   ├── annot # humandata.npz files
57
+ │   ├── cache # cached humandata
58
+ │   └── img # original data files
59
+ ├── datasets
60
+ ├── demo
61
+ ├── human_models
62
+ │   └── human_model_files # parametric human models
63
+ ├── main
64
+ ├── models
65
+ ├── outputs
66
+ │   └── smplest_x_h
67
+ ├── pretrained_models
68
+ │   ├── vitpose_huge.pth # for training only
69
+ │   ├── yolov8x.pt # auto download during inference
70
+ │   └── smplest_x_h
71
+ │      ├── smplest_x_h.pth.tar
72
+ │      └── config_base.py
73
+ ├── scripts
74
+ ├── utils
75
+ ├── README.md
76
+ └── requirements.txt
77
+ ```
78
+
79
+ ## Inference
80
+
81
+ - Place the video for inference under `SMPLest-X/demo`
82
+ - Prepare the pretrained model under `SMPLest-X/pretrained_models`
83
+ - Pretrained YOLO model will be downloaded automatically during the first time usage.
84
+ - Inference output will be saved in `SMPLest-X/demo`
85
+
86
+ ```bash
87
+ sh scripts/inference.sh {MODEL_DIR} {FILE_NAME} {FPS}
88
+
89
+ # For inferencing test_video.mp4 (30FPS) with SMPLest-X/pretrained_models/smplest_x_h/smplest_x_h.pth.tar
90
+ sh scripts/inference.sh smplest_x_h test_video.mp4 30
91
+ ```
92
+
93
+
94
+ ## Training
95
+ ```bash
96
+ bash scripts/train.sh {JOB_NAME} {NUM_GPUS} {CONFIG_FILE}
97
+
98
+ # For training SMPLest-X-H with 16 GPUS
99
+ bash scripts/train.sh smplest_x_h 16 config_smplest_x_h.py
100
+ ```
101
+ - CONFIG_FILE is the file name under `SMPLest-X/config`
102
+ - Logs and checkpoints will be saved to `SMPLest-X/outputs/train_{JOB_NAME}_{DATE_TIME}`
103
+
104
+
105
+ ## Testing
106
+ ```bash
107
+ sh scripts/test.sh {TEST_DATSET} {MODEL_DIR} {CKPT_ID}
108
+
109
+ # For testing the model SMPLest-X/outputs/smplest_x_h/model_dump/snapshot_5.pth.tar
110
+ # on dataset SynHand
111
+ sh scripts/test.sh SynHand smplest_x_h 5
112
+ ```
113
+ - NUM_GPU = 1 is used by default for testing
114
+ - Logs and results will be saved to `SMPLest-X/outputs/test_{TEST_DATSET}_ep{CKPT_ID}_{DATE_TIME}`
115
+
116
+
117
+ ## FAQ
118
+ - How do I animate my virtual characters with SMPLest-X output (like that in the demo video)?
119
+ - We are working on that, please stay tuned!
120
+ Currently, this repo supports SMPL-X estimation and a simple visualization (overlay of SMPL-X vertices).
121
+
122
+
123
+ ## Citation
124
+ ```text
125
+ # SMPLest-X
126
+ @article{yin2025smplest,
127
+ title={SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation},
128
+ author={Yin, Wanqi and Cai, Zhongang and Wang, Ruisi and Zeng, Ailing and Wei, Chen and Sun, Qingping and Mei, Haiyi and Wang, Yanjun and Pang, Hui En and Zhang, Mingyuan and Zhang, Lei and Loy, Chen Change and Yamashita, Atsushi and Yang, Lei and Liu, Ziwei},
129
+ journal={arXiv preprint arXiv:2501.09782},
130
+ year={2025}
131
+ }
132
+
133
+ # SMPLer-X
134
+ @inproceedings{cai2023smplerx,
135
+ title={{SMPLer-X}: Scaling up expressive human pose and shape estimation},
136
+ author={Cai, Zhongang and Yin, Wanqi and Zeng, Ailing and Wei, Chen and Sun, Qingping and Yanjun, Wang and Pang, Hui En and Mei, Haiyi and Zhang, Mingyuan and Zhang, Lei and Loy, Chen Change and Yang, Lei and Liu, Ziwei},
137
+ booktitle={Advances in Neural Information Processing Systems},
138
+ year={2023}
139
+ }
140
+ ```
141
+
142
+ ## Explore More [SMPLCap](https://github.com/SMPLCap) Projects
143
+
144
+ - [TPAMI'25] [SMPLest-X](https://github.com/SMPLCap/SMPLest-X): An extended version of [SMPLer-X](https://github.com/SMPLCap/SMPLer-X) with stronger foundation models.
145
+ - [ECCV'24] [WHAC](https://github.com/SMPLCap/WHAC): World-grounded human pose and camera estimation from monocular videos.
146
+ - [CVPR'24] [AiOS](https://github.com/SMPLCap/AiOS): An all-in-one-stage pipeline combining detection and 3D human reconstruction.
147
+ - [NeurIPS'23] [SMPLer-X](https://github.com/SMPLCap/SMPLer-X): Scaling up EHPS towards a family of generalist foundation models.
148
+ - [NeurIPS'23] [RoboSMPLX](https://github.com/SMPLCap/RoboSMPLX): A framework to enhance the robustness of
149
+ whole-body pose and shape estimation.
150
+ - [ICCV'23] [Zolly](https://github.com/SMPLCap/Zolly): 3D human mesh reconstruction from perspective-distorted images.
151
+ - [arXiv'23] [PointHPS](https://github.com/SMPLCap/PointHPS): 3D HPS from point clouds captured in real-world settings.
152
+ - [NeurIPS'22] [HMR-Benchmarks](https://github.com/SMPLCap/hmr-benchmarks): A comprehensive benchmark of HPS datasets, backbones, and training strategies.
SMPLest-X/datasets/SynHand.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os.path as osp
2
+ from datasets.humandata import HumanDataset
3
+
4
+ class SynHand(HumanDataset):
5
+ def __init__(self, transform, data_split, cfg):
6
+ super(SynHand, self).__init__(transform, data_split, cfg)
7
+
8
+ self.cfg = cfg
9
+
10
+ self.use_cache = getattr(self.cfg.data, 'use_cache', False)
11
+ self.annot_path_cache = osp.join(self.cfg.data.data_dir, 'cache', f'synhand_{self.data_split}.npz')
12
+
13
+ self.img_shape = None #(h, w)
14
+ self.cam_param = {}
15
+
16
+ # load data or cache
17
+ if self.use_cache and osp.isfile(self.annot_path_cache):
18
+ print(f'[{self.__class__.__name__}] Loading cache from {self.annot_path_cache}')
19
+ self.datalist = self.load_cache(self.annot_path_cache)
20
+ else:
21
+ if self.use_cache:
22
+ print(f'[{self.__class__.__name__}] Cache not found, generating cache...')
23
+
24
+ self.datalist = []
25
+ self.img_dir = osp.join(self.cfg.data.data_dir, 'img', 'synbody')
26
+
27
+ if self.data_split == 'train':
28
+ filename = f'synhand_20240927_241004_4628_fix_betas.npz'
29
+ else:
30
+ filename = f'synhand_20241018_test_241023_1188_fix_betas.npz'
31
+
32
+ self.annot_path = osp.join(self.cfg.data.data_dir, 'annot', filename)
33
+
34
+ self.datalist= self.load_data(
35
+ train_sample_interval=getattr(self.cfg.data, f'{self.__class__.__name__}_train_sample_interval', 1),
36
+ test_sample_interval=getattr(self.cfg.data, f'{self.__class__.__name__}_test_sample_interval', 10))
37
+
38
+ if self.use_cache:
39
+ self.save_cache(self.annot_path_cache, self.datalist)
SMPLest-X/datasets/dataset.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import random
2
+ import numpy as np
3
+ from torch.utils.data.dataset import Dataset
4
+
5
+ class MultipleDatasets(Dataset):
6
+ def __init__(self, dbs, make_same_len=True, total_len=None, verbose=False, length_dict=None):
7
+ self.dbs = dbs
8
+ self.db_num = len(self.dbs)
9
+ self.max_db_data_num = max([len(db) for db in dbs])
10
+ self.make_same_len = make_same_len
11
+ self.length_dict = length_dict
12
+
13
+ if length_dict is not None: # weighted
14
+ self.db_length = []
15
+ for db in dbs:
16
+ name = db.__class__.__name__
17
+ length = length_dict[name]
18
+ self.db_length.append(length)
19
+
20
+ self.db_len_cumsum = np.cumsum(self.db_length)
21
+ else:
22
+ self.db_len_cumsum = np.cumsum([len(db) for db in dbs])
23
+
24
+ if total_len == 'auto': #concat/balance
25
+ self.total_len = self.db_len_cumsum[-1]
26
+ self.auto_total_len = True
27
+ else: #balance/weighted
28
+ self.total_len = total_len
29
+ self.auto_total_len = False
30
+
31
+ if total_len is not None:
32
+ self.per_db_len = self.total_len // self.db_num
33
+ if verbose:
34
+ print('datasets original:', [len(self.dbs[i]) for i in range(self.db_num)])
35
+ if length_dict is not None:
36
+ print('defined length:', length_dict)
37
+ print(f'Auto total length: {self.auto_total_len}, {self.total_len}')
38
+
39
+
40
+ def __len__(self):
41
+ # all dbs have the same length
42
+ if self.make_same_len:
43
+ if self.total_len is None:
44
+ # match the longest length
45
+ return self.max_db_data_num * self.db_num
46
+ else:
47
+ # each dataset has the same length and total len is fixed
48
+ return self.total_len
49
+ else:
50
+ if self.total_len is None:
51
+ # each db has different length, simply concat
52
+ return sum([len(db) for db in self.dbs])
53
+ else:
54
+ # defined or calculated db length
55
+ return self.total_len
56
+
57
+ def __getitem__(self, index):
58
+ if self.make_same_len:
59
+ if self.total_len is None:
60
+ # match the longest length
61
+ db_idx = index // self.max_db_data_num
62
+ data_idx = index % self.max_db_data_num
63
+ if data_idx >= len(self.dbs[db_idx]) * (self.max_db_data_num // len(self.dbs[db_idx])): # last batch: random sampling
64
+ data_idx = random.randint(0,len(self.dbs[db_idx])-1)
65
+ else: # before last batch: use modular
66
+ data_idx = data_idx % len(self.dbs[db_idx])
67
+ else:
68
+ db_idx = index // self.per_db_len
69
+ data_idx = index % self.per_db_len
70
+ if db_idx > (self.db_num - 1):
71
+ # last batch: randomly choose one dataset
72
+ db_idx = random.randint(0,self.db_num - 1)
73
+
74
+ if len(self.dbs[db_idx]) < self.per_db_len and \
75
+ data_idx >= len(self.dbs[db_idx]) * (self.per_db_len // len(self.dbs[db_idx])):
76
+ # last batch: random sampling in this dataset
77
+ data_idx = random.randint(0,len(self.dbs[db_idx]) - 1)
78
+ else:
79
+ # before last batch: use modular
80
+ data_idx = data_idx % len(self.dbs[db_idx])
81
+
82
+
83
+ else:
84
+ for i in range(self.db_num):
85
+ if index < self.db_len_cumsum[i]:
86
+ db_idx = i
87
+ break
88
+ if db_idx == 0:
89
+ data_idx = index
90
+ else:
91
+ data_idx = index - self.db_len_cumsum[db_idx-1]
92
+
93
+ if self.length_dict is not None:
94
+ # make the data idx valid if total data less than defined data length
95
+ if len(self.dbs[db_idx]) < self.db_length[db_idx] and \
96
+ data_idx >= len(self.dbs[db_idx]) * (self.db_length[db_idx] // len(self.dbs[db_idx])):
97
+ # last batch: random sampling in this dataset
98
+ data_idx = random.randint(0,len(self.dbs[db_idx]) - 1)
99
+ else:
100
+ # before last batch: use modular
101
+ data_idx = data_idx % len(self.dbs[db_idx])
102
+
103
+ return self.dbs[db_idx][data_idx]
SMPLest-X/datasets/humandata.py ADDED
@@ -0,0 +1,1076 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import os.path as osp
3
+ import numpy as np
4
+ import torch
5
+ import copy
6
+ from human_models.human_models import SMPL, SMPLX
7
+ from utils.data_utils import load_img, process_bbox, augmentation, \
8
+ process_db_coord, process_human_model_output, \
9
+ process_db_coord_crop, gen_cropped_hands
10
+ from utils.transforms import rigid_align, batch_rodrigues
11
+ import tqdm
12
+ import time
13
+ import random
14
+ import pickle
15
+ from constants import *
16
+
17
+
18
+
19
+
20
+ class Cache():
21
+ """ A custom implementation for OSX pipeline
22
+ Need to run tool/cache/fix_cache.py to fix paths
23
+ """
24
+ def __init__(self, load_path=None):
25
+ if load_path is not None:
26
+ self.load(load_path)
27
+
28
+ def load(self, load_path):
29
+ self.load_path = load_path
30
+ self.cache = np.load(load_path, allow_pickle=True)
31
+ self.data_len = self.cache['data_len']
32
+ self.data_strategy = self.cache['data_strategy']
33
+ assert self.data_len == len(self.cache) - 2 # data_len, data_strategy
34
+ self.cache = None
35
+
36
+ @classmethod
37
+ def save(cls, save_path, data_list, data_strategy):
38
+ assert save_path is not None, 'save_path is None'
39
+ data_len = len(data_list)
40
+ cache = {}
41
+ for i, data in enumerate(data_list):
42
+ cache[str(i)] = data
43
+ assert len(cache) == data_len
44
+ # update meta
45
+ cache.update({
46
+ 'data_len': data_len,
47
+ 'data_strategy': data_strategy})
48
+
49
+ np.savez_compressed(save_path, **cache)
50
+ print(f'Cache saved to {save_path}.')
51
+
52
+ # def shuffle(self):
53
+ # random.shuffle(self.mapping)
54
+
55
+ def __len__(self):
56
+ return self.data_len
57
+
58
+ def __getitem__(self, idx):
59
+ if self.cache is None:
60
+ self.cache = np.load(self.load_path, allow_pickle=True)
61
+ # mapped_idx = self.mapping[idx]
62
+ # cache_data = self.cache[str(mapped_idx)]
63
+ cache_data = self.cache[str(idx)]
64
+ data = cache_data.item()
65
+ return data
66
+
67
+
68
+ class HumanDataset(torch.utils.data.Dataset):
69
+
70
+ def __init__(self, transform, data_split, cfg):
71
+ self.transform = transform
72
+ self.data_split = data_split
73
+ self.cfg = cfg
74
+
75
+ # dataset information, to be filled by child class
76
+ self.img_dir = None
77
+ self.annot_path = None
78
+ self.annot_path_cache = None
79
+ self.use_cache = False
80
+ self.save_idx = 0
81
+ self.img_shape = None # (h, w)
82
+ self.cam_param = None # {'focal_length': (fx, fy), 'princpt': (cx, cy)}
83
+ self.use_betas_neutral = False
84
+
85
+ self.smpl_x = SMPLX.get_instance()
86
+ self.smpl = SMPL.get_instance()
87
+
88
+ self.joint_set = {
89
+ 'joint_num': self.smpl_x.joint_num,
90
+ 'joints_name': self.smpl_x.joints_name,
91
+ 'flip_pairs': self.smpl_x.flip_pairs}
92
+ self.joint_set['root_joint_idx'] = self.joint_set['joints_name'].index('Pelvis')
93
+
94
+ self.downsample_mat = pickle.load(open(f'{self.cfg.model.human_model_path}/smplx2smpl.pkl',
95
+ 'rb'))['matrix']
96
+
97
+ def load_cache(self, annot_path_cache):
98
+ datalist = Cache(annot_path_cache)
99
+ return datalist
100
+
101
+ def save_cache(self, annot_path_cache, datalist):
102
+ print(f'[{self.__class__.__name__}] Caching datalist to {self.annot_path_cache}...')
103
+ Cache.save(
104
+ annot_path_cache,
105
+ datalist,
106
+ data_strategy=getattr(self.cfg.data, 'data_strategy', None)
107
+ )
108
+
109
+ def load_data(self, train_sample_interval=1, test_sample_interval=1):
110
+
111
+ content = np.load(self.annot_path, allow_pickle=True)
112
+ num_examples = len(content['image_path'])
113
+
114
+ if 'meta' in content:
115
+ meta = content['meta'].item()
116
+ print('meta keys:', meta.keys())
117
+ if 'annot_valid' in meta.keys(): # agora
118
+ annot_valid = meta['annot_valid']
119
+ else:
120
+ annot_valid = None
121
+
122
+ if 'valid_label' in meta.keys(): # Ubody
123
+ invalid_label = np.array(meta['valid_label']) == 0 # skip when True
124
+ iscrowd = np.array(meta['iscrowd']) # skip when True
125
+ num_keypoints_zero = np.array(meta['num_keypoints']) == 0 # skip when True
126
+
127
+ skip_ubody = [iscrowd[i] or num_keypoints_zero[i] or invalid_label[i] for i in range(len(iscrowd))]
128
+ else:
129
+ skip_ubody = None
130
+
131
+ if 'iscrowd' in meta.keys(): # mscoco
132
+ iscrowd = np.array(meta['iscrowd']) # skip when True
133
+ num_keypoints_zero = np.array(meta['num_keypoints']) == 0 # skip when True
134
+
135
+ skip_mscoco = [iscrowd[i] or num_keypoints_zero[i] for i in range(len(iscrowd))]
136
+ else:
137
+ skip_mscoco = None
138
+
139
+ else:
140
+ meta = None
141
+ annot_valid = None
142
+ skip_ubody = None
143
+ skip_mscoco= None
144
+ print('No meta info provided! Please give height and width manually')
145
+
146
+ # ARCTIC val set
147
+ if 'vertices3d_path' in content:
148
+ vertices3d_path = content['vertices3d_path']
149
+ else:
150
+ vertices3d_path = None
151
+
152
+ print(f'Start loading humandata {self.annot_path} into memory...\nDataset includes: {content.files}'); tic = time.time()
153
+ image_path = content['image_path']
154
+
155
+ if meta is not None and 'height' in meta:
156
+ height = np.array(meta['height'])
157
+ width = np.array(meta['width'])
158
+ image_shape = np.stack([height, width], axis=-1)
159
+ else:
160
+ image_shape = None
161
+
162
+ if self.__class__.__name__ == 'Hi4D':
163
+ image_shape = None
164
+
165
+
166
+ if 'smplx' in content:
167
+ smplx = content['smplx'].item()
168
+ as_smplx = 'smplx'
169
+ if self.__class__.__name__ == 'UBody':
170
+ smplx.pop('leye_pose')
171
+ smplx.pop('reye_pose')
172
+ elif 'smpl' in content:
173
+ smplx = content['smpl'].item()
174
+ as_smplx = 'smpl'
175
+ elif 'smplh' in content:
176
+ smplx = content['smplh'].item()
177
+ as_smplx = 'smplh'
178
+
179
+ # TODO: temp solution, should be more general. But SHAPY is very special
180
+ elif self.__class__.__name__ == 'SHAPY':
181
+ smplx = {}
182
+
183
+ else:
184
+ raise KeyError('No SMPL for SMPLX available, please check keys:\n'
185
+ f'{content.files}')
186
+
187
+ if self.__class__.__name__ == 'PW3D' and 'test' in self.annot_path:
188
+ print('load smpl for PW3d!')
189
+ smplx = content['smpl'].item()
190
+ as_smplx = 'smpl'
191
+ gender = content['meta'].item()['gender']
192
+ else:
193
+ gender = None
194
+
195
+ print('Smplx param', smplx.keys())
196
+
197
+ # mano
198
+ if 'mano' in content:
199
+ mano = content['mano']
200
+ else:
201
+ mano = None
202
+
203
+ # bbox
204
+ if 'bbox_xywh' in content:
205
+ bbox_xywh = content['bbox_xywh']
206
+ else:
207
+ raise KeyError(f'Necessary key [bbox_xywh] is missing in HumanData for {self.__class__.__name__}.')
208
+
209
+ if 'lhand_bbox_xywh' in content:
210
+ lhand_bbox_xywh = content['lhand_bbox_xywh']
211
+ else:
212
+ lhand_bbox_xywh = np.zeros((num_examples, 5))
213
+
214
+ if 'rhand_bbox_xywh' in content:
215
+ rhand_bbox_xywh = content['rhand_bbox_xywh']
216
+ else:
217
+ rhand_bbox_xywh = np.zeros((num_examples, 5))
218
+
219
+ if 'face_bbox_xywh' in content:
220
+ face_bbox_xywh = content['face_bbox_xywh']
221
+ else:
222
+ face_bbox_xywh = np.zeros((num_examples, 5))
223
+
224
+ decompressed = False
225
+ if content['__keypoints_compressed__']:
226
+ decompressed_kps = self.decompress_keypoints(content)
227
+ decompressed = True
228
+
229
+ keypoints3d = None
230
+ valid_kps3d = False
231
+ keypoints3d_mask = None
232
+ valid_kps3d_mask = False
233
+ for kps3d_key in KPS3D_KEYS:
234
+ if kps3d_key in content:
235
+ keypoints3d = decompressed_kps[kps3d_key][:, SMPLX_137_MAPPING, :3] if decompressed \
236
+ else content[kps3d_key][:, SMPLX_137_MAPPING, :3]
237
+ valid_kps3d = True
238
+
239
+ if f'{kps3d_key}_mask' in content:
240
+ keypoints3d_mask = content[f'{kps3d_key}_mask'][SMPLX_137_MAPPING]
241
+ valid_kps3d_mask = True
242
+ elif 'keypoints3d_mask' in content:
243
+ keypoints3d_mask = content['keypoints3d_mask'][SMPLX_137_MAPPING]
244
+ valid_kps3d_mask = True
245
+ break
246
+
247
+ for kps2d_key in KPS2D_KEYS:
248
+ if kps2d_key in content:
249
+ keypoints2d = decompressed_kps[kps2d_key][:, SMPLX_137_MAPPING, :2] if decompressed \
250
+ else content[kps2d_key][:, SMPLX_137_MAPPING, :2]
251
+
252
+ if f'{kps2d_key}_mask' in content:
253
+ keypoints2d_mask = content[f'{kps2d_key}_mask'][SMPLX_137_MAPPING]
254
+ elif 'keypoints2d_mask' in content:
255
+ keypoints2d_mask = content['keypoints2d_mask'][SMPLX_137_MAPPING]
256
+ break
257
+
258
+ mask = keypoints3d_mask if valid_kps3d_mask \
259
+ else keypoints2d_mask
260
+
261
+ print('Done. Time: {:.2f}s'.format(time.time() - tic))
262
+
263
+ datalist = []
264
+
265
+ for i in tqdm.tqdm(range(int(num_examples))):
266
+ if annot_valid is not None and not annot_valid[i]: continue # for agora
267
+ if skip_ubody is not None and skip_ubody[i]: continue # for ubody
268
+ if skip_mscoco is not None and skip_mscoco[i]: continue # for mscoco
269
+
270
+ if self.data_split == 'train' and i % train_sample_interval != 0:
271
+ continue
272
+ if self.data_split == 'test' and i % test_sample_interval != 0:
273
+ continue
274
+
275
+ if vertices3d_path is not None:
276
+ vertices3d = np.load(osp.join(self.img_dir, vertices3d_path[i]))
277
+ else:
278
+ vertices3d = None
279
+
280
+ if 'MPI_INF_3DHP' in self.__class__.__name__:
281
+ img_path = osp.join(self.img_dir, image_path[i][1:]) # remove the first /
282
+ else:
283
+ img_path = osp.join(self.img_dir, image_path[i])
284
+
285
+ # import pdb; pdb.set_trace()
286
+ img_shape = image_shape[i] if image_shape is not None else self.img_shape
287
+
288
+ joint_img = keypoints2d[i]
289
+ joint_valid = mask.reshape(-1, 1)
290
+
291
+ bbox = bbox_xywh[i][:4]
292
+ lhand_bbox = lhand_bbox_xywh[i]
293
+ rhand_bbox = rhand_bbox_xywh[i]
294
+ face_bbox = face_bbox_xywh[i]
295
+ if hasattr(self.cfg.data, 'bbox_ratio'):
296
+ bbox_ratio = self.cfg.data.bbox_ratio * 0.833 # preprocess body bbox is giving 1.2 box padding
297
+ else:
298
+ bbox_ratio = 1.25
299
+ left_hand_chosen = None
300
+
301
+ bbox = process_bbox(bbox, img_width=img_shape[1], img_height=img_shape[0], ratio=bbox_ratio,
302
+ input_img_shape=self.cfg.model.input_img_shape)
303
+ if bbox is None:
304
+ print("skip since no bbox")
305
+ continue
306
+ # if hasattr(cfg, 'do_crop'):
307
+ # if cfg.do_crop:
308
+ # joint_valid_temp = process_db_coord_crop(bbox, joint_img)
309
+
310
+ if lhand_bbox[-1] > 0: # conf > 0
311
+ lhand_bbox = lhand_bbox[:4]
312
+ if hasattr(self.cfg.data, 'bbox_ratio'):
313
+ lhand_bbox = process_bbox(lhand_bbox, img_width=img_shape[1], img_height=img_shape[0], ratio=self.cfg.data.bbox_ratio,
314
+ input_img_shape=self.cfg.model.input_img_shape)
315
+ if lhand_bbox is not None:
316
+ lhand_bbox[2:] += lhand_bbox[:2] # xywh -> xyxy
317
+ else:
318
+ lhand_bbox = None
319
+ if rhand_bbox[-1] > 0:
320
+ rhand_bbox = rhand_bbox[:4]
321
+ if hasattr(self.cfg.data, 'bbox_ratio'):
322
+ rhand_bbox = process_bbox(rhand_bbox, img_width=img_shape[1], img_height=img_shape[0], ratio=self.cfg.data.bbox_ratio,
323
+ input_img_shape=self.cfg.model.input_img_shape)
324
+ if rhand_bbox is not None:
325
+ rhand_bbox[2:] += rhand_bbox[:2] # xywh -> xyxy
326
+ else:
327
+ rhand_bbox = None
328
+ if face_bbox[-1] > 0:
329
+ face_bbox = face_bbox[:4]
330
+ if hasattr(self.cfg.data, 'bbox_ratio'):
331
+ face_bbox = process_bbox(face_bbox, img_width=img_shape[1], img_height=img_shape[0], ratio=self.cfg.data.bbox_ratio,
332
+ input_img_shape=self.cfg.model.input_img_shape)
333
+ if face_bbox is not None:
334
+ face_bbox[2:] += face_bbox[:2] # xywh -> xyxy
335
+ else:
336
+ face_bbox = None
337
+
338
+ if valid_kps3d:
339
+ joint_cam = keypoints3d[i]
340
+ else:
341
+ joint_cam = None
342
+
343
+ smplx_param = {k: v[i] for k, v in smplx.items()}
344
+
345
+ # agora skip kids
346
+ is_kids = smplx_param.pop('betas_extra', 0)
347
+ # import pdb; pdb.set_trace()
348
+ if is_kids != 0:
349
+ print('skip kids')
350
+ continue
351
+
352
+ # TODO: set invalid if None?
353
+ smplx_param['body_pose'] = smplx_param.pop('body_pose', None)
354
+ smplx_param['root_pose'] = smplx_param.pop('global_orient', None)
355
+ smplx_param['shape'] = smplx_param.pop('betas', np.zeros(10, dtype=np.float32))
356
+ smplx_param['shape'] = smplx_param['shape'][:10]
357
+ smplx_param['trans'] = smplx_param.pop('transl', np.zeros(3))
358
+ smplx_param['lhand_pose'] = smplx_param.pop('left_hand_pose', None)
359
+ smplx_param['rhand_pose'] = smplx_param.pop('right_hand_pose', None)
360
+ smplx_param['expr'] = smplx_param.pop('expression', None)
361
+
362
+ # TODO do not fix betas, give up shape supervision
363
+ if 'betas_neutral' in smplx_param:
364
+ smplx_param['shape'] = smplx_param.pop('betas_neutral')
365
+ # smplx_param['shape'] = np.zeros(10, dtype=np.float32)
366
+ smplx_param['shape'] = smplx_param['shape'][:10]
367
+
368
+ # # TODO fix shape of poses
369
+ if self.__class__.__name__ == 'Talkshow':
370
+ smplx_param['body_pose'] = smplx_param['body_pose'].reshape(21, 3)
371
+ smplx_param['lhand_pose'] = smplx_param['lhand_pose'].reshape(15, 3)
372
+ smplx_param['rhand_pose'] = smplx_param['lhand_pose'].reshape(15, 3)
373
+ smplx_param['expr'] = smplx_param['expr'][:10]
374
+
375
+ if self.__class__.__name__ == 'ARCTIC':
376
+ smplx_param['shape'] = np.zeros(10, dtype=np.float32)
377
+
378
+ # 'BEDLAM'
379
+ if self.__class__.__name__ in ['GTA_Human2','GTA_Human_full',
380
+ 'SynBody_whac', 'SynBody_Magic1','SynBody', 'SynBody_full', 'SynHand',
381
+ 'CHI3D', 'FIT3D', 'HumanSC3D',
382
+ 'MOYO', 'ARCTIC',]:
383
+ smplx_param['shape'] = smplx_param['shape'][:10]
384
+ # print('[Flat Hand Mean]:manually set flat_hand_mean = True -> flat_hand_mean = False')
385
+ # manually set flat_hand_mean = True -> flat_hand_mean = False
386
+ smplx_param['lhand_pose'] -= HANDS_MEAN_L
387
+ smplx_param['rhand_pose'] -= HANDS_MEAN_R
388
+
389
+
390
+ if as_smplx == 'smpl':
391
+ smplx_param['smpl_pose'] = smplx_param['body_pose']
392
+ smplx_param['body_pose'] = smplx_param['body_pose'].reshape(-1, 3)
393
+ smplx_param['body_pose'] = smplx_param['body_pose'][:21, :] # use smpl body_pose on smplx
394
+
395
+ smplx_param['smpl_shape'] = smplx_param['shape']
396
+ smplx_param['shape'] = np.zeros(10, dtype=np.float32) # drop smpl betas for smplx
397
+
398
+ if gender is not None:
399
+ smplx_param['gender'] = gender[i]
400
+
401
+ if as_smplx == 'smplh':
402
+ smplx_param['shape'] = np.zeros(10, dtype=np.float32) # drop smpl betas for smplx
403
+
404
+ import pdb
405
+ # for hand datasets, set shape and pose to all zero
406
+ if self.__class__.__name__ in ['FreiHand', 'InterHand', 'BlurHand', 'HanCo']:
407
+ smplx_param['shape'] = np.zeros((10, ))
408
+ smplx_param['root_pose'] = np.zeros((3))
409
+ smplx_param['body_pose'] = np.zeros((21, 3))
410
+
411
+ if smplx_param['lhand_pose'] is None or (smplx_param['lhand_pose'] == 0).all():
412
+ smplx_param['lhand_valid'] = False
413
+ # TODO: manually set joint_valid to 0
414
+ joint_valid[self.smpl_x.joint_part['lhand'], :] = 0
415
+ joint_valid[self.smpl_x.lwrist_idx, :] = 0
416
+ else:
417
+ smplx_param['lhand_valid'] = True
418
+ joint_valid[self.smpl_x.joint_part['lhand'], :] = 1
419
+ joint_valid[self.smpl_x.lwrist_idx, :] = 1
420
+
421
+ if smplx_param['rhand_pose'] is None or (smplx_param['rhand_pose'] == 0).all():
422
+ smplx_param['rhand_valid'] = False
423
+ joint_valid[self.smpl_x.joint_part['rhand'], :] = 0
424
+ joint_valid[self.smpl_x.rwrist_idx, :] = 0
425
+ else:
426
+ smplx_param['rhand_valid'] = True
427
+ joint_valid[self.smpl_x.joint_part['rhand'], :] = 1
428
+ joint_valid[self.smpl_x.rwrist_idx, :] = 1
429
+
430
+ if smplx_param['expr'] is None:
431
+ smplx_param['face_valid'] = False
432
+ else:
433
+ smplx_param['face_valid'] = True
434
+
435
+ if joint_cam is not None and np.any(np.isnan(joint_cam)):
436
+ print("skip since no kps")
437
+ continue
438
+
439
+ datalist.append({
440
+ 'img_path': img_path,
441
+ 'img_shape': img_shape,
442
+ 'bbox': bbox,
443
+ 'lhand_bbox': lhand_bbox,
444
+ 'rhand_bbox': rhand_bbox,
445
+ 'face_bbox': face_bbox,
446
+ 'joint_img': joint_img,
447
+ 'joint_cam': joint_cam,
448
+ 'joint_valid': joint_valid,
449
+ 'smplx_param': smplx_param,
450
+ 'model': as_smplx,
451
+ 'extrinsic_r': extrinsic_r[i] if 'extrinsic_r' in locals() else np.eye(3,3),
452
+ 'vertices3d': vertices3d if vertices3d is not None else -1,
453
+ 'idx': i})
454
+
455
+ # save memory
456
+ del content, image_path, bbox_xywh, lhand_bbox_xywh, rhand_bbox_xywh, face_bbox_xywh, keypoints3d, keypoints2d
457
+
458
+ if self.data_split == 'train':
459
+ print(f'[{self.__class__.__name__} train] original size:', int(num_examples),
460
+ '. Sample interval:', train_sample_interval,
461
+ '. Sampled size:', len(datalist))
462
+
463
+ if (getattr(self.cfg.data, 'data_strategy', None) == 'balance' and self.data_split == 'train') or \
464
+ (getattr(self.cfg.data, 'data_strategy', None) == 'weighted' and self.data_split == 'train'):
465
+ print(f'[{self.__class__.__name__}] Using [balance/weighted] strategy with datalist shuffled...')
466
+ random.seed(2023)
467
+ random.shuffle(datalist)
468
+
469
+ return datalist
470
+
471
+ def __len__(self):
472
+ return len(self.datalist)
473
+
474
+ def __getitem__(self, idx):
475
+ try:
476
+ data = copy.deepcopy(self.datalist[idx])
477
+ except Exception as e:
478
+ print(f'[{self.__class__.__name__}] Error loading data {idx}')
479
+ print(e)
480
+ exit(0)
481
+
482
+ img_path, img_shape, bbox = data['img_path'], data['img_shape'], data['bbox']
483
+ img = load_img(img_path)
484
+ no_aug = getattr(self.cfg.data, 'no_aug', False)
485
+ img, img2bb_trans, bb2img_trans, rot, do_flip = augmentation(no_aug, img, bbox,
486
+ self.data_split,
487
+ self.cfg.model.input_img_shape)
488
+ img = self.transform(img.astype(np.float32)) / 255.
489
+
490
+ ## for vis on original img
491
+ focal = [self.cfg.model.focal[0] / self.cfg.model.input_body_shape[1] * bbox[2],
492
+ self.cfg.model.focal[1] / self.cfg.model.input_body_shape[0] * bbox[3]]
493
+ princpt = [self.cfg.model.princpt[0] / self.cfg.model.input_body_shape[1] * bbox[2] + bbox[0],
494
+ self.cfg.model.princpt[1] / self.cfg.model.input_body_shape[0] * bbox[3] + bbox[1]]
495
+
496
+ if self.data_split == 'train':
497
+ # h36m gt
498
+ joint_cam = data['joint_cam']
499
+ if joint_cam is not None:
500
+ dummy_cord = False
501
+ joint_cam = joint_cam - joint_cam[self.joint_set['root_joint_idx'], None, :] # root-relative
502
+ else:
503
+ # dummy cord as joint_cam
504
+ dummy_cord = True
505
+ joint_cam = np.zeros((self.joint_set['joint_num'], 3), dtype=np.float32)
506
+
507
+ joint_img = data['joint_img']
508
+ joint_img = np.concatenate((joint_img[:, :2], joint_cam[:, 2:]), 1) # x, y, depth
509
+ if not dummy_cord:
510
+ joint_img[:, 2] = (joint_img[:, 2] / (self.cfg.model.body_3d_size / 2) + 1) / 2. * self.cfg.model.output_hm_shape[0] # discretize depth
511
+
512
+ joint_img_aug, joint_cam_wo_ra, \
513
+ joint_cam_ra, joint_valid, joint_trunc = process_db_coord(
514
+ joint_img=joint_img,
515
+ joint_cam=joint_cam,
516
+ joint_valid=data['joint_valid'],
517
+ do_flip=do_flip,
518
+ img_shape=img_shape,
519
+ flip_pairs=self.joint_set['flip_pairs'],
520
+ img2bb_trans=img2bb_trans,
521
+ rot=rot,
522
+ src_joints_name=self.joint_set['joints_name'],
523
+ target_joints_name=self.smpl_x.joints_name,
524
+ input_img_shape=self.cfg.model.input_img_shape,
525
+ output_hm_shape=self.cfg.model.output_hm_shape,
526
+ input_body_shape=self.cfg.model.input_body_shape)
527
+
528
+ # smplx coordinates and parameters
529
+ smplx_param = data['smplx_param']
530
+ smplx_joint_img, smplx_joint_cam, smplx_joint_trunc, smplx_pose, smplx_shape, smplx_expr, \
531
+ smplx_pose_valid, smplx_joint_valid, smplx_expr_valid, \
532
+ smplx_mesh_cam_orig = process_human_model_output(
533
+ human_model_param=smplx_param,
534
+ cam_param=self.cam_param,
535
+ do_flip=do_flip,
536
+ img_shape=img_shape,
537
+ img2bb_trans=img2bb_trans,
538
+ rot=rot,
539
+ human_model_type='smplx',
540
+ joint_img=None if self.cam_param else joint_img,
541
+ body_3d_size=self.cfg.model.body_3d_size,
542
+ hand_3d_size=self.cfg.model.hand_3d_size,
543
+ face_3d_size=self.cfg.model.face_3d_size,
544
+ input_img_shape=self.cfg.model.input_img_shape,
545
+ output_hm_shape=self.cfg.model.output_hm_shape,
546
+ )
547
+
548
+ # TODO temp fix keypoints3d for renbody
549
+ if 'RenBody' in self.__class__.__name__:
550
+ joint_cam_ra = smplx_joint_cam.copy()
551
+ joint_cam_wo_ra = smplx_joint_cam.copy()
552
+ joint_cam_wo_ra[self.smpl_x.joint_part['lhand'], :] = joint_cam_wo_ra[self.smpl_x.joint_part['lhand'], :] \
553
+ + joint_cam_wo_ra[self.smpl_x.lwrist_idx, None, :] # left hand root-relative
554
+ joint_cam_wo_ra[self.smpl_x.joint_part['rhand'], :] = joint_cam_wo_ra[self.smpl_x.joint_part['rhand'], :] \
555
+ + joint_cam_wo_ra[self.smpl_x.rwrist_idx, None, :] # right hand root-relative
556
+ joint_cam_wo_ra[self.smpl_x.joint_part['face'], :] = joint_cam_wo_ra[self.smpl_x.joint_part['face'], :] \
557
+ + joint_cam_wo_ra[self.smpl_x.neck_idx, None,: ] # face root-relative
558
+ # change smplx_shape if use_betas_neutral
559
+ # processing follows that in process_human_model_output
560
+ if self.use_betas_neutral:
561
+ smplx_shape = smplx_param['betas_neutral'].reshape(1, -1)
562
+ smplx_shape[(np.abs(smplx_shape) > 3).any(axis=1)] = 0.
563
+ smplx_shape = smplx_shape.reshape(-1)
564
+
565
+ # SMPLX pose parameter validity
566
+ smplx_pose_valid = np.tile(smplx_pose_valid[:, None], (1, 9)).reshape(-1)
567
+ smplx_joint_valid = smplx_joint_valid[:, None]
568
+ smplx_joint_trunc = smplx_joint_valid * smplx_joint_trunc
569
+ if not (smplx_shape == 0).all():
570
+ smplx_shape_valid = True
571
+ else:
572
+ smplx_shape_valid = False
573
+
574
+ # hand and face bbox transform
575
+ lhand_bbox, lhand_bbox_valid = self.process_hand_face_bbox(data['lhand_bbox'], do_flip, img_shape, img2bb_trans,
576
+ self.cfg.model.input_img_shape, self.cfg.model.output_hm_shape)
577
+ rhand_bbox, rhand_bbox_valid = self.process_hand_face_bbox(data['rhand_bbox'], do_flip, img_shape, img2bb_trans,
578
+ self.cfg.model.input_img_shape, self.cfg.model.output_hm_shape)
579
+ face_bbox, face_bbox_valid = self.process_hand_face_bbox(data['face_bbox'], do_flip, img_shape, img2bb_trans,
580
+ self.cfg.model.input_img_shape, self.cfg.model.output_hm_shape)
581
+ if do_flip:
582
+ lhand_bbox, rhand_bbox = rhand_bbox, lhand_bbox
583
+ lhand_bbox_valid, rhand_bbox_valid = rhand_bbox_valid, lhand_bbox_valid
584
+ lhand_bbox_center = (lhand_bbox[0] + lhand_bbox[1]) / 2.
585
+ rhand_bbox_center = (rhand_bbox[0] + rhand_bbox[1]) / 2.
586
+ face_bbox_center = (face_bbox[0] + face_bbox[1]) / 2.
587
+ lhand_bbox_size = lhand_bbox[1] - lhand_bbox[0]
588
+ rhand_bbox_size = rhand_bbox[1] - rhand_bbox[0]
589
+ face_bbox_size = face_bbox[1] - face_bbox[0]
590
+
591
+
592
+ joint_img_aug = np.nan_to_num(joint_img_aug, nan=0.0)
593
+ smplx_pose = np.nan_to_num(smplx_pose, nan=0.0)
594
+ joint_cam_wo_ra = np.nan_to_num(joint_cam_wo_ra, nan=0.0)
595
+ joint_cam_ra = np.nan_to_num(joint_cam_ra, nan=0.0)
596
+
597
+ smplx_cam_trans = np.array(smplx_param['trans']) if 'trans' in smplx_param else None
598
+ inputs = {'img': img}
599
+ targets = {'joint_img': joint_img_aug, # keypoints2d
600
+ 'smplx_joint_img': joint_img_aug, #smplx_joint_img, # projected smplx if valid cam_param, else same as keypoints2d
601
+ 'joint_cam': joint_cam_wo_ra, # joint_cam actually not used in any loss, # raw kps3d probably without ra
602
+ 'smplx_joint_cam': joint_cam_ra, # kps3d with body, face, hand ra # smplx_joint_cam if (dummy_cord or getattr(cfg, 'debug', False)) else
603
+ 'smplx_pose': smplx_pose,
604
+ 'smplx_shape': smplx_shape,
605
+ 'smplx_expr': smplx_expr,
606
+ 'lhand_bbox_center': lhand_bbox_center, 'lhand_bbox_size': lhand_bbox_size,
607
+ 'rhand_bbox_center': rhand_bbox_center, 'rhand_bbox_size': rhand_bbox_size,
608
+ 'face_bbox_center': face_bbox_center, 'face_bbox_size': face_bbox_size,
609
+ 'lhand_root': smplx_param['lhand_root'] if 'lhand_root' in smplx_param else np.zeros((1, 3)),
610
+ 'rhand_root': smplx_param['rhand_root'] if 'rhand_root' in smplx_param else np.zeros((1, 3)),
611
+ 'smplx_cam_trans': smplx_cam_trans}
612
+ meta_info = {'joint_valid': joint_valid,
613
+ 'joint_trunc': joint_trunc,
614
+ 'smplx_joint_valid': smplx_joint_valid if dummy_cord else joint_valid,
615
+ 'smplx_joint_trunc': smplx_joint_trunc if dummy_cord else joint_trunc,
616
+ 'smplx_pose_valid': smplx_pose_valid,
617
+ 'smplx_shape_valid': float(smplx_shape_valid),
618
+ 'smplx_expr_valid': float(smplx_expr_valid),
619
+ 'is_3D': float(False) if dummy_cord else float(True),
620
+ 'lhand_bbox_valid': lhand_bbox_valid,
621
+ 'rhand_bbox_valid': rhand_bbox_valid, 'face_bbox_valid': face_bbox_valid,
622
+ }
623
+
624
+ return inputs, targets, meta_info
625
+
626
+ # test
627
+ else:
628
+ joint_cam = data['joint_cam']
629
+ if joint_cam is not None:
630
+ dummy_cord = False
631
+ joint_cam = joint_cam - joint_cam[self.joint_set['root_joint_idx'], None, :] # root-relative
632
+ else:
633
+ # dummy cord as joint_cam
634
+ dummy_cord = True
635
+ joint_cam = np.zeros((self.joint_set['joint_num'], 3), dtype=np.float32)
636
+
637
+ joint_img = data['joint_img']
638
+ joint_img = np.concatenate((joint_img[:, :2], joint_cam[:, 2:]), 1) # x, y, depth
639
+ if not dummy_cord:
640
+ joint_img[:, 2] = (joint_img[:, 2] / (self.cfg.model.body_3d_size / 2) + 1) / 2. * self.cfg.model.output_hm_shape[0] # discretize depth
641
+
642
+ joint_img, joint_cam, joint_cam_ra, joint_valid, joint_trunc = process_db_coord(
643
+ joint_img=joint_img,
644
+ joint_cam=joint_cam,
645
+ joint_valid=data['joint_valid'],
646
+ do_flip=do_flip,
647
+ img_shape=img_shape,
648
+ flip_pairs=self.joint_set['flip_pairs'],
649
+ img2bb_trans=img2bb_trans,
650
+ rot=rot,
651
+ src_joints_name=self.joint_set['joints_name'],
652
+ target_joints_name=self.smpl_x.joints_name,
653
+ input_img_shape=self.cfg.model.input_img_shape,
654
+ output_hm_shape=self.cfg.model.output_hm_shape,
655
+ input_body_shape=self.cfg.model.input_body_shape)
656
+
657
+ # smplx coordinates and parameters
658
+ smplx_param = data['smplx_param']
659
+ smplx_cam_trans = np.array(smplx_param['trans']) if 'trans' in smplx_param else None
660
+
661
+ model_type = data['model']
662
+ if model_type == 'smplx':
663
+ smplx_joint_img, smplx_joint_cam, smplx_joint_trunc, smplx_pose, smplx_shape, smplx_expr, \
664
+ smplx_pose_valid, smplx_joint_valid, \
665
+ smplx_expr_valid, smplx_mesh_cam_orig = process_human_model_output(
666
+ human_model_param=smplx_param,
667
+ cam_param=self.cam_param,
668
+ do_flip=do_flip,
669
+ img_shape=img_shape,
670
+ img2bb_trans=img2bb_trans,
671
+ rot=rot,
672
+ human_model_type=model_type,
673
+ joint_img=None if self.cam_param else joint_img,
674
+ body_3d_size=self.cfg.model.body_3d_size,
675
+ hand_3d_size=self.cfg.model.hand_3d_size,
676
+ face_3d_size=self.cfg.model.face_3d_size,
677
+ input_img_shape=self.cfg.model.input_img_shape,
678
+ output_hm_shape=self.cfg.model.output_hm_shape,
679
+ )
680
+ smplx_pose_valid = np.tile(smplx_pose_valid[:, None], (1, 9)).reshape(-1)
681
+
682
+ elif model_type == 'smpl':
683
+ _, _, _, _, _, smplx_mesh_cam_orig = process_human_model_output(
684
+ human_model_param=smplx_param,
685
+ cam_param=self.cam_param,
686
+ do_flip=do_flip,
687
+ img_shape=img_shape,
688
+ img2bb_trans=img2bb_trans,
689
+ rot=rot,
690
+ human_model_type=model_type,
691
+ joint_img=None if self.cam_param else joint_img,
692
+ body_3d_size=self.cfg.model.body_3d_size,
693
+ hand_3d_size=self.cfg.model.hand_3d_size,
694
+ face_3d_size=self.cfg.model.face_3d_size,
695
+ input_img_shape=self.cfg.model.input_img_shape,
696
+ output_hm_shape=self.cfg.model.output_hm_shape,
697
+ )
698
+
699
+ lhand_valid = 1.0
700
+ rhand_valid = 1.0
701
+ # process the hand mesh for mano dataset
702
+ if self.__class__.__name__ in ['FreiHand', 'InterHand', 'BlurHand', 'HanCo']:
703
+ if (data['smplx_param']['lhand_root']==0).all():
704
+ lhand_valid = 0.0
705
+ if (data['smplx_param']['rhand_root']==0).all():
706
+ rhand_valid = 0.0
707
+
708
+ # build smplx but redo the hand rotation with global orientation
709
+
710
+ smplx_pose_rotmat = batch_rodrigues(torch.Tensor(smplx_pose.reshape(-1,3))).reshape(smplx_pose.shape[0], -1)
711
+
712
+ # redo the hand oration: R_gt x R_inv x hand mesh
713
+ R_gt_l = data['smplx_param']['lhand_root'] if 'lhand_root' in smplx_param else np.zeros((1, 3))
714
+ R_gt_r = data['smplx_param']['rhand_root'] if 'rhand_root' in smplx_param else np.zeros((1, 3))
715
+
716
+ R_gt_l = batch_rodrigues(torch.Tensor(R_gt_l.reshape(-1,3))).reshape(R_gt_l.shape[0], 3, 3)
717
+ R_gt_r = batch_rodrigues(torch.Tensor(R_gt_r.reshape(-1,3))).reshape(R_gt_r.shape[0], 3, 3)
718
+ # import pdb; pdb.set_trace()
719
+
720
+ # get hand mesh with wrong global orientation
721
+ lhand_mesh = smplx_mesh_cam_orig[self.smpl_x.hand_vertex_idx['left_hand'], :]
722
+ rhand_mesh = smplx_mesh_cam_orig[self.smpl_x.hand_vertex_idx['right_hand'], :]
723
+
724
+ # get wrist offset and align hand mesh to pelvis
725
+ lwrist_offset = np.dot(self.smpl_x.J_regressor, smplx_mesh_cam_orig)[self.smpl_x.J_regressor_idx['lwrist'], None, :]
726
+ rwrist_offset = np.dot(self.smpl_x.J_regressor, smplx_mesh_cam_orig)[self.smpl_x.J_regressor_idx['rwrist'], None, :]
727
+ mesh_out_lhand_align = lhand_mesh - lwrist_offset
728
+ mesh_out_rhand_align = rhand_mesh - rwrist_offset
729
+
730
+ # redo the rotation and align to wrist position world->cam
731
+ R_gt_l = np.dot(data['extrinsic_r'], R_gt_l.squeeze())
732
+ R_gt_r = np.dot(data['extrinsic_r'], R_gt_r.squeeze())
733
+
734
+ mesh_global_lhand = np.dot(R_gt_l, mesh_out_lhand_align.T).T #+ lwrist_offset
735
+ mesh_global_rhand = np.dot(R_gt_r, mesh_out_rhand_align.T).T #+ rwrist_offset
736
+
737
+ # replace hand mesh in smplx mesh
738
+ smplx_mesh_cam_orig[self.smpl_x.hand_vertex_idx['left_hand'], :] = mesh_global_lhand
739
+ smplx_mesh_cam_orig[self.smpl_x.hand_vertex_idx['right_hand'], :] = mesh_global_rhand
740
+
741
+ if self.__class__.__name__ in ['ARCTIC'] and (data['vertices3d'] != -1).all():
742
+ smplx_mesh_cam_orig = data['vertices3d']
743
+
744
+ data['joint_cam'][self.smpl_x.joint_part['lhand'], :] = (data['joint_cam'][self.smpl_x.joint_part['lhand'], :] - \
745
+ data['joint_cam'][self.smpl_x.lwrist_idx, None,:]) * lhand_valid# left hand root-relative
746
+ data['joint_cam'][self.smpl_x.joint_part['rhand'], :] = (data['joint_cam'][self.smpl_x.joint_part['rhand'], :] - \
747
+ data['joint_cam'][self.smpl_x.rwrist_idx, None,:]) * rhand_valid
748
+
749
+
750
+ inputs = {'img': img}
751
+ targets = {'smplx_cam_trans' : smplx_cam_trans,
752
+ 'smplx_mesh_cam': smplx_mesh_cam_orig,
753
+ 'joint_cam': data['joint_cam'],}
754
+ meta_info = {'bb2img_trans': bb2img_trans,
755
+ 'gt_smplx_transl':smplx_cam_trans,
756
+ 'lhand_valid': lhand_valid,
757
+ 'rhand_valid': rhand_valid,
758
+ 'focal': focal, 'principal_pt': princpt,
759
+ 'img_id': data['idx']}
760
+
761
+ return inputs, targets, meta_info
762
+
763
+ def process_hand_face_bbox(self, bbox, do_flip, img_shape, img2bb_trans, input_img_shape, output_hm_shape):
764
+ if bbox is None:
765
+ bbox = np.array([0, 0, 1, 1], dtype=np.float32).reshape(2, 2) # dummy value
766
+ bbox_valid = float(False) # dummy value
767
+ else:
768
+ # reshape to top-left (x,y) and bottom-right (x,y)
769
+ bbox = bbox.reshape(2, 2)
770
+
771
+ # flip augmentation
772
+ if do_flip:
773
+ bbox[:, 0] = img_shape[1] - bbox[:, 0] - 1
774
+ bbox[0, 0], bbox[1, 0] = bbox[1, 0].copy(), bbox[0, 0].copy() # xmin <-> xmax swap
775
+
776
+ # make four points of the bbox
777
+ bbox = bbox.reshape(4).tolist()
778
+ xmin, ymin, xmax, ymax = bbox
779
+ bbox = np.array([[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]], dtype=np.float32).reshape(4, 2)
780
+
781
+ # affine transformation (crop, rotation, scale)
782
+ bbox_xy1 = np.concatenate((bbox, np.ones_like(bbox[:, :1])), 1)
783
+ bbox = np.dot(img2bb_trans, bbox_xy1.transpose(1, 0)).transpose(1, 0)[:, :2]
784
+ bbox[:, 0] = bbox[:, 0] / input_img_shape[1] * output_hm_shape[2]
785
+ bbox[:, 1] = bbox[:, 1] / input_img_shape[0] * output_hm_shape[1]
786
+
787
+ # make box a rectangle without rotation
788
+ xmin = np.min(bbox[:, 0])
789
+ xmax = np.max(bbox[:, 0])
790
+ ymin = np.min(bbox[:, 1])
791
+ ymax = np.max(bbox[:, 1])
792
+ bbox = np.array([xmin, ymin, xmax, ymax], dtype=np.float32)
793
+
794
+ bbox_valid = float(True)
795
+ bbox = bbox.reshape(2, 2)
796
+
797
+ return bbox, bbox_valid
798
+
799
+ def evaluate(self, outs, cur_sample_idx=None):
800
+ sample_num = len(outs)
801
+ eval_result = {'pa_mpvpe_all': [], 'pa_mpvpe_l_hand': [], 'pa_mpvpe_r_hand': [], 'pa_mpvpe_hand': [], 'pa_mpvpe_face': [],
802
+ 'mpvpe_all': [], 'mpvpe_l_hand': [], 'mpvpe_r_hand': [], 'mpvpe_hand': [], 'mpvpe_face': [],
803
+ 'pa_mpjpe_body': [], 'pa_mpjpe_l_hand': [], 'pa_mpjpe_r_hand': [], 'pa_mpjpe_hand': [],
804
+ 'mpjpe_body':[], 'mpjpe_l_hand': [], 'mpjpe_r_hand': [], 'mpjpe_hand': [],}
805
+
806
+
807
+ for n in range(sample_num):
808
+ out = outs[n]
809
+ mesh_gt = out['smplx_mesh_cam_pseudo_gt']
810
+ mesh_out = out['smplx_mesh_cam']
811
+
812
+
813
+ if mesh_gt.shape[0] == 6890:
814
+ face = self.smpl.face
815
+
816
+ # root align -> ds (better for pve and mpjpe)
817
+ mesh_out_root_align = mesh_out - np.dot(self.smpl_x.J_regressor, mesh_out)[self.smpl_x.J_regressor_idx['pelvis'], None,
818
+ :] + np.dot(self.smpl.joint_regressor, mesh_gt)[self.smpl.orig_root_joint_idx, None,:]
819
+ mesh_out_root_align = np.matmul(self.downsample_mat, mesh_out_root_align)
820
+
821
+ # PVE from body
822
+ mpvpe_all = np.sqrt(np.sum((mesh_out_root_align - mesh_gt) ** 2, 1)).mean() * 1000
823
+ eval_result['mpvpe_all'].append(mpvpe_all)
824
+ mesh_out_pa_align = rigid_align(mesh_out_root_align, mesh_gt)
825
+ pa_mpvpe_all = np.sqrt(np.sum((mesh_out_pa_align - mesh_gt) ** 2, 1)).mean() * 1000
826
+ eval_result['pa_mpvpe_all'].append(pa_mpvpe_all)
827
+
828
+ # MPJPE from body joints
829
+ joint_gt_body = np.dot(self.smpl.joint_regressor, mesh_gt)[LSP_MAPPIMG, :]
830
+ joint_out_body_root_align = np.dot(self.smpl.joint_regressor, mesh_out_root_align)[LSP_MAPPIMG, :]
831
+ joint_out_body_pa_align = rigid_align(joint_out_body_root_align, joint_gt_body)
832
+
833
+ eval_result['mpjpe_body'].append(
834
+ np.sqrt(np.sum((joint_out_body_root_align - joint_gt_body) ** 2, 1)).mean() * 1000)
835
+ eval_result['pa_mpjpe_body'].append(
836
+ np.sqrt(np.sum((joint_out_body_pa_align - joint_gt_body) ** 2, 1)).mean() * 1000)
837
+
838
+ else:
839
+
840
+ # MPVPE from all vertices
841
+ mesh_out_align = mesh_out - np.dot(self.smpl_x.J_regressor, mesh_out)[self.smpl_x.J_regressor_idx['pelvis'], None,
842
+ :] + np.dot(self.smpl_x.J_regressor, mesh_gt)[self.smpl_x.J_regressor_idx['pelvis'], None, :]
843
+ joint_out_body_root_align = np.dot(self.smpl_x.j14_regressor, mesh_out_align)
844
+
845
+ mpvpe_all = np.sqrt(np.sum((mesh_out_align - mesh_gt) ** 2, 1)).mean() * 1000
846
+ eval_result['mpvpe_all'].append(mpvpe_all)
847
+ mesh_out_align = rigid_align(mesh_out, mesh_gt)
848
+ pa_mpvpe_all = np.sqrt(np.sum((mesh_out_align - mesh_gt) ** 2, 1)).mean() * 1000
849
+ eval_result['pa_mpvpe_all'].append(pa_mpvpe_all)
850
+
851
+
852
+ mesh_gt_lhand = mesh_gt[self.smpl_x.hand_vertex_idx['left_hand'], :] - np.dot(
853
+ self.smpl_x.J_regressor, mesh_gt)[self.smpl_x.J_regressor_idx['lwrist'], None, :]
854
+ mesh_gt_rhand = mesh_gt[self.smpl_x.hand_vertex_idx['right_hand'], :] - np.dot(
855
+ self.smpl_x.J_regressor, mesh_gt)[self.smpl_x.J_regressor_idx['rwrist'], None, :]
856
+
857
+ mesh_out_lhand = mesh_out[self.smpl_x.hand_vertex_idx['left_hand'], :]
858
+ mesh_out_rhand = mesh_out[self.smpl_x.hand_vertex_idx['right_hand'], :]
859
+ mesh_out_lhand_align = mesh_out_lhand - np.dot(self.smpl_x.J_regressor, mesh_out)[
860
+ self.smpl_x.J_regressor_idx['lwrist'], None, :]
861
+ mesh_out_rhand_align = mesh_out_rhand - np.dot(self.smpl_x.J_regressor, mesh_out)[
862
+ self.smpl_x.J_regressor_idx['rwrist'], None, :]
863
+
864
+ if out['lhand_valid']:
865
+ eval_result['mpvpe_l_hand'].append(np.sqrt(
866
+ np.sum((mesh_out_lhand_align - mesh_gt_lhand) ** 2, 1)).mean() * 1000)
867
+ if out['rhand_valid']:
868
+ eval_result['mpvpe_r_hand'].append(np.sqrt(
869
+ np.sum((mesh_out_rhand_align - mesh_gt_rhand) ** 2, 1)).mean() * 1000)
870
+ hand_mpve_all = (np.sqrt(
871
+ np.sum((mesh_out_lhand_align - mesh_gt_lhand) ** 2, 1)).mean() * 1000 * out['lhand_valid'] + np.sqrt(
872
+ np.sum((mesh_out_rhand_align - mesh_gt_rhand) ** 2, 1)).mean() * 1000 * out['rhand_valid']
873
+ ) / (out['lhand_valid'] + out['rhand_valid'])
874
+
875
+ eval_result['mpvpe_hand'].append(hand_mpve_all)
876
+
877
+ mesh_out_lhand_align = rigid_align(mesh_out_lhand, mesh_gt_lhand)
878
+ mesh_out_rhand_align = rigid_align(mesh_out_rhand, mesh_gt_rhand)
879
+
880
+ if out['lhand_valid']:
881
+ eval_result['pa_mpvpe_l_hand'].append(np.sqrt(
882
+ np.sum((mesh_out_lhand_align - mesh_gt_lhand) ** 2, 1)).mean() * 1000)
883
+ if out['rhand_valid']:
884
+ eval_result['pa_mpvpe_r_hand'].append(np.sqrt(
885
+ np.sum((mesh_out_rhand_align - mesh_gt_rhand) ** 2, 1)).mean() * 1000)
886
+
887
+ eval_result['pa_mpvpe_hand'].append((np.sqrt(
888
+ np.sum((mesh_out_lhand_align - mesh_gt_lhand) ** 2, 1)).mean() * 1000 * out['lhand_valid'] + np.sqrt(
889
+ np.sum((mesh_out_rhand_align - mesh_gt_rhand) ** 2, 1)).mean() * 1000 * out['rhand_valid']) /
890
+ (out['lhand_valid'] + out['rhand_valid']))
891
+
892
+ # MPVPE from face vertices
893
+ mesh_gt_face = mesh_gt[self.smpl_x.face_vertex_idx, :]
894
+ mesh_out_face = mesh_out[self.smpl_x.face_vertex_idx, :]
895
+ mesh_out_face_align = mesh_out_face - np.dot(self.smpl_x.J_regressor, mesh_out)[self.smpl_x.J_regressor_idx['neck'],
896
+ None, :] + np.dot(self.smpl_x.J_regressor, mesh_gt)[
897
+ self.smpl_x.J_regressor_idx['neck'], None, :]
898
+ eval_result['mpvpe_face'].append(
899
+ np.sqrt(np.sum((mesh_out_face_align - mesh_gt_face) ** 2, 1)).mean() * 1000)
900
+ mesh_out_face_align = rigid_align(mesh_out_face, mesh_gt_face)
901
+ eval_result['pa_mpvpe_face'].append(
902
+ np.sqrt(np.sum((mesh_out_face_align - mesh_gt_face) ** 2, 1)).mean() * 1000)
903
+
904
+ joint_gt_body = np.dot(self.smpl_x.j14_regressor, mesh_gt)
905
+ joint_out_body = np.dot(self.smpl_x.j14_regressor, mesh_out)
906
+ joint_out_body_align = rigid_align(joint_out_body, joint_gt_body)
907
+
908
+ eval_result['mpjpe_body'].append(
909
+ np.sqrt(np.sum((joint_out_body_root_align - joint_gt_body) ** 2, 1)).mean() * 1000)
910
+ eval_result['pa_mpjpe_body'].append(
911
+ np.sqrt(np.sum((joint_out_body_align - joint_gt_body) ** 2, 1)).mean() * 1000)
912
+
913
+ joint_gt_lhand = np.dot(self.smpl_x.orig_hand_regressor['left'], mesh_gt)[1:]
914
+ joint_gt_rhand = np.dot(self.smpl_x.orig_hand_regressor['right'], mesh_gt)[1:]
915
+
916
+
917
+ joint_out_lhand = np.dot(self.smpl_x.orig_hand_regressor['left'], mesh_out)[1:] - np.dot(self.smpl_x.J_regressor, mesh_out)[
918
+ self.smpl_x.J_regressor_idx['lwrist'], None, :]
919
+
920
+ joint_out_rhand = np.dot(self.smpl_x.orig_hand_regressor['right'], mesh_out)[1:] - np.dot(self.smpl_x.J_regressor, mesh_out)[
921
+ self.smpl_x.J_regressor_idx['rwrist'], None, :]
922
+
923
+
924
+ joint_out_lhand_align = rigid_align(joint_out_lhand, joint_gt_lhand)
925
+ joint_out_rhand_align = rigid_align(joint_out_rhand, joint_gt_rhand)
926
+
927
+ if out['lhand_valid']:
928
+ eval_result['mpjpe_l_hand'].append(np.sqrt(
929
+ np.sum((joint_out_lhand - joint_gt_lhand) ** 2, 1)).mean() * 1000)
930
+ if out['rhand_valid']:
931
+ eval_result['mpjpe_r_hand'].append(np.sqrt(
932
+ np.sum((joint_out_rhand - joint_gt_rhand) ** 2, 1)).mean() * 1000)
933
+
934
+ hand_pa_mpve_all = (np.sqrt(
935
+ np.sum((joint_out_lhand - joint_gt_lhand) ** 2, 1)).mean() * 1000 * out['lhand_valid'] + np.sqrt(
936
+ np.sum((joint_out_rhand - joint_gt_rhand) ** 2, 1)).mean() * 1000 * out['rhand_valid']
937
+ ) / (out['lhand_valid'] + out['rhand_valid'])
938
+
939
+ eval_result['mpjpe_hand'].append(hand_pa_mpve_all)
940
+
941
+ if out['lhand_valid']:
942
+ value = np.sqrt(np.sum((joint_out_lhand_align - joint_gt_lhand) ** 2, 1)).mean() * 1000
943
+
944
+ if value < 100:
945
+ eval_result['pa_mpjpe_l_hand'].append(value)
946
+ if value > 100:
947
+ print("lhand:",value)
948
+ continue
949
+
950
+ if out['rhand_valid']:
951
+ value = np.sqrt(np.sum((joint_out_rhand_align - joint_gt_rhand) ** 2, 1)).mean() * 1000
952
+
953
+ if value < 100:
954
+ eval_result['pa_mpjpe_r_hand'].append(value)
955
+ if value > 100:
956
+ print("rhand:",value)
957
+ continue
958
+
959
+ eval_result['pa_mpjpe_hand'].append((np.sqrt(
960
+ np.sum((joint_out_lhand_align - joint_gt_lhand) ** 2, 1)).mean() * 1000 * out['lhand_valid'] + np.sqrt(
961
+ np.sum((joint_out_rhand_align - joint_gt_rhand) ** 2, 1)).mean() * 1000 * out['rhand_valid']
962
+ ) / (out['lhand_valid'] + out['rhand_valid']))
963
+
964
+
965
+ return eval_result
966
+
967
+ def print_eval_result(self, eval_result):
968
+ print(f'======{self.cfg.data.testset}======')
969
+ print('PA MPVPE (All): %.2f mm' % np.mean(eval_result['pa_mpvpe_all']))
970
+ print('PA MPVPE (L-Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_l_hand']))
971
+ print('PA MPVPE (R-Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_r_hand']))
972
+ print('PA MPVPE (Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_hand']))
973
+ print('PA MPVPE (Face): %.2f mm' % np.mean(eval_result['pa_mpvpe_face']))
974
+ print()
975
+
976
+ print('MPVPE (All): %.2f mm' % np.mean(eval_result['mpvpe_all']))
977
+ print('MPVPE (L-Hands): %.2f mm' % np.mean(eval_result['mpvpe_l_hand']))
978
+ print('MPVPE (R-Hands): %.2f mm' % np.mean(eval_result['mpvpe_r_hand']))
979
+ print('MPVPE (Hands): %.2f mm' % np.mean(eval_result['mpvpe_hand']))
980
+ print('MPVPE (Face): %.2f mm' % np.mean(eval_result['mpvpe_face']))
981
+ print()
982
+
983
+ print('PA MPJPE (Body): %.2f mm' % np.mean(eval_result['pa_mpjpe_body']))
984
+ print('PA MPJPE (L-Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_l_hand']))
985
+ print('PA MPJPE (R-Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_r_hand']))
986
+ print('PA MPJPE (Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_hand']))
987
+ print()
988
+
989
+ print('MPJPE (Body): %.2f mm' % np.mean(eval_result['mpjpe_body']))
990
+ print('MPJPE (L-Hands): %.2f mm' % np.mean(eval_result['mpjpe_l_hand']))
991
+ print('MPJPE (R-Hands): %.2f mm' % np.mean(eval_result['mpjpe_r_hand']))
992
+ print('MPJPE (Hands): %.2f mm' % np.mean(eval_result['mpjpe_hand']))
993
+ print()
994
+
995
+ print(f"{np.mean(eval_result['pa_mpvpe_all'])},{np.mean(eval_result['pa_mpvpe_l_hand'])},{np.mean(eval_result['pa_mpvpe_r_hand'])},{np.mean(eval_result['pa_mpvpe_hand'])},{np.mean(eval_result['pa_mpvpe_face'])},"
996
+ f"{np.mean(eval_result['mpvpe_all'])},{np.mean(eval_result['mpvpe_l_hand'])},{np.mean(eval_result['mpvpe_r_hand'])},{np.mean(eval_result['mpvpe_hand'])},{np.mean(eval_result['mpvpe_face'])},"
997
+ f"{np.mean(eval_result['pa_mpjpe_body'])},{np.mean(eval_result['pa_mpjpe_l_hand'])},{np.mean(eval_result['pa_mpjpe_r_hand'])},{np.mean(eval_result['pa_mpjpe_hand'])}")
998
+ print()
999
+
1000
+
1001
+ f = open(os.path.join(self.cfg.log.result_dir, 'result.txt'), 'w')
1002
+ f.write(f'{self.cfg.data.testset} dataset \n')
1003
+ f.write('PA MPVPE (All): %.2f mm\n' % np.mean(eval_result['pa_mpvpe_all']))
1004
+ f.write('PA MPVPE (L-Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_l_hand']))
1005
+ f.write('PA MPVPE (R-Hands): %.2f mm' % np.mean(eval_result['pa_mpvpe_r_hand']))
1006
+ f.write('PA MPVPE (Hands): %.2f mm\n' % np.mean(eval_result['pa_mpvpe_hand']))
1007
+ f.write('PA MPVPE (Face): %.2f mm\n' % np.mean(eval_result['pa_mpvpe_face']))
1008
+ f.write('MPVPE (All): %.2f mm\n' % np.mean(eval_result['mpvpe_all']))
1009
+ f.write('MPVPE (L-Hands): %.2f mm' % np.mean(eval_result['mpvpe_l_hand']))
1010
+ f.write('MPVPE (R-Hands): %.2f mm' % np.mean(eval_result['mpvpe_r_hand']))
1011
+ f.write('MPVPE (Hands): %.2f mm' % np.mean(eval_result['mpvpe_hand']))
1012
+ f.write('MPVPE (Face): %.2f mm\n' % np.mean(eval_result['mpvpe_face']))
1013
+ f.write('PA MPJPE (Body): %.2f mm\n' % np.mean(eval_result['pa_mpjpe_body']))
1014
+ f.write('PA MPJPE (L-Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_l_hand']))
1015
+ f.write('PA MPJPE (R-Hands): %.2f mm' % np.mean(eval_result['pa_mpjpe_r_hand']))
1016
+ f.write('PA MPJPE (Hands): %.2f mm\n' % np.mean(eval_result['pa_mpjpe_hand']))
1017
+ f.write(f"{np.mean(eval_result['pa_mpvpe_all'])},{np.mean(eval_result['pa_mpvpe_l_hand'])},{np.mean(eval_result['pa_mpvpe_r_hand'])},{np.mean(eval_result['pa_mpvpe_hand'])},{np.mean(eval_result['pa_mpvpe_face'])},"
1018
+ f"{np.mean(eval_result['mpvpe_all'])},{np.mean(eval_result['mpvpe_l_hand'])},{np.mean(eval_result['mpvpe_r_hand'])},{np.mean(eval_result['mpvpe_hand'])},{np.mean(eval_result['mpvpe_face'])},"
1019
+ f"{np.mean(eval_result['pa_mpjpe_body'])},{np.mean(eval_result['pa_mpjpe_l_hand'])},{np.mean(eval_result['pa_mpjpe_r_hand'])},{np.mean(eval_result['pa_mpjpe_hand'])}")
1020
+ f.close()
1021
+
1022
+ def decompress_keypoints(self, humandata) -> None:
1023
+ """If a key contains 'keypoints', and f'{key}_mask' is in self.keys(),
1024
+ invalid zeros will be inserted to the right places and f'{key}_mask'
1025
+ will be unlocked.
1026
+
1027
+ Raises:
1028
+ KeyError:
1029
+ A key contains 'keypoints' has been found
1030
+ but its corresponding mask is missing.
1031
+ """
1032
+ assert bool(humandata['__keypoints_compressed__']) is True
1033
+ key_pairs = []
1034
+ for key in humandata.files:
1035
+ if key not in KPS2D_KEYS + KPS3D_KEYS:
1036
+ continue
1037
+ mask_key = f'{key}_mask'
1038
+ if mask_key in humandata.files:
1039
+ print(f'Decompress {key}...')
1040
+ key_pairs.append([key, mask_key])
1041
+ decompressed_dict = {}
1042
+ for kpt_key, mask_key in key_pairs:
1043
+ mask_array = np.asarray(humandata[mask_key])
1044
+ compressed_kpt = humandata[kpt_key]
1045
+ kpt_array = \
1046
+ self.add_zero_pad(compressed_kpt, mask_array)
1047
+ decompressed_dict[kpt_key] = kpt_array
1048
+ del humandata
1049
+ return decompressed_dict
1050
+
1051
+ def add_zero_pad(self, compressed_array: np.ndarray,
1052
+ mask_array: np.ndarray) -> np.ndarray:
1053
+ """Pad zeros to a compressed keypoints array.
1054
+
1055
+ Args:
1056
+ compressed_array (np.ndarray):
1057
+ A compressed keypoints array.
1058
+ mask_array (np.ndarray):
1059
+ The mask records compression relationship.
1060
+
1061
+ Returns:
1062
+ np.ndarray:
1063
+ A keypoints array in full-size.
1064
+ """
1065
+ if compressed_array.shape[1] == mask_array.shape[0]:
1066
+ print("No need to decompress")
1067
+ return compressed_array
1068
+ else:
1069
+ assert mask_array.sum() == compressed_array.shape[1]
1070
+ data_len, _, dim = compressed_array.shape
1071
+ mask_len = mask_array.shape[0]
1072
+ ret_value = np.zeros(
1073
+ shape=[data_len, mask_len, dim], dtype=compressed_array.dtype)
1074
+ valid_mask_index = np.where(mask_array == 1)[0]
1075
+ ret_value[:, valid_mask_index, :] = compressed_array
1076
+ return ret_value
SMPLest-X/humandata_prep/README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Guide to HumanData and View tools
2
+ ========================
3
+
4
+ ## What is HumanData?
5
+
6
+ HumanData is designed to provide a unified format for SMPL/SMPLX datasets to support joint training and evaluation.
7
+
8
+ The project is maintained in MMHuman3D.
9
+ See [detailed info](https://github.com/open-mmlab/mmhuman3d/blob/convertors/docs/human_data.md) for data structure and sample usage.
10
+
11
+ If you want to create your own humandata file, please refer to the sample below and maintain the similiar structure. Basically it is a big dictionary with some lists or dicts of lists, any dict with the correct structure works (Not necesscarily in `HumanData` class).
12
+
13
+ ## Sample Visualization Script
14
+
15
+ We provide a simple script to check the annotation and visualize the results. The script will read the annotation from HumanData and render it on the corresponding image using pyrender.
16
+
17
+ ### Download
18
+
19
+ Download sample here: [Hugging Face](https://huggingface.co/waanqii/SMPLest-X/resolve/main/hd_sample_humandata.zip?download=true)
20
+
21
+ ### Extract
22
+ Follow the file structure as in main page. Extract to `data` folder, the structure should look like this:
23
+ ```
24
+ ├── data
25
+ │ ├── annot
26
+ │ │ └── hd_10sample.npz # sample annotation
27
+ │ └── img # original data files
28
+ │ └── egocentric_color
29
+ ```
30
+
31
+ ### Environment
32
+ Basically you can directly install pyrender and trimesh to your environment, I tested many platforms without finding confilcts.
33
+ CPU version of pytorch is also supported.
34
+ ```
35
+ conda create -n hd_vis python=3.9
36
+ conda activate hd_vis
37
+ conda install torch torchvision torchaudio cudatoolkit=11.3 -c pytorch
38
+ pip install pyrender trimesh numpy opencv-python tqdm smplx
39
+ ```
40
+
41
+ ### Visualization
42
+ Fixed command to for demo sample.
43
+ ```
44
+ python humandata_prep/check.py \
45
+ --hd_path data/annot/hd_10sample.npz \
46
+ --image_folder data/img \
47
+ --output_folder data/vis_output \
48
+ --body_model_path human_models/human_model_files
49
+ ```
50
+ - Rendered image will be saved in the output folder.
51
+
52
+
53
+ ## Important Points: when visualizing other humandata files
54
+ This section is for those who want to debug or create their own humandata files.
55
+
56
+ - Check `flat_hand_mean` if is correctly set, for humandata, it shoule be specified in `hd['misc']['flat_hand_mean']` or by default `False`
57
+ - Check `gender`
58
+ - For some specific datasets, they might provide mesh vertices instead of SMPL/SMPLX parameters, we suggest to fit the mesh to parameters for every instance to maintain the consistency of the visualization. Some of those datasets are:
59
+ - Arctic: They provide `vtemplate` instead of `betas`
60
+ - EHF: They provide mesh files
61
+ - Standalone [SMPLX parameters fitting script](https://github.com/open-mmlab/mmhuman3d/blob/convertors/tools/preprocess/fit_shape2smplx.py)
62
+
63
+
64
+
SMPLest-X/humandata_prep/check.py ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+ import cv2
4
+ import os
5
+ import argparse
6
+ import torch
7
+ import pyrender
8
+ import trimesh
9
+ import smplx
10
+
11
+ from tqdm import tqdm
12
+
13
+ # for visualizing and checking purpose, no need jaw, eye pose
14
+ smpl_shape = {'betas': (-1, 10), 'transl': (-1, 3), 'global_orient': (-1, 3), 'body_pose': (-1, 69)}
15
+ smplx_shape = {'betas': (-1, 10), 'transl': (-1, 3), 'global_orient': (-1, 3),
16
+ 'body_pose': (-1, 21, 3), 'left_hand_pose': (-1, 15, 3), 'right_hand_pose': (-1, 15, 3)}
17
+
18
+ def get_cam_params(param, idx):
19
+
20
+ '''
21
+ Read camera parameters from humandata
22
+ Input:
23
+ Output:
24
+ '''
25
+
26
+ R, T = None, None
27
+
28
+ # read cam params
29
+ try:
30
+ focal_length = param['meta'].item()['focal_length'][idx]
31
+ camera_center = param['meta'].item()['principal_point'][idx]
32
+ except TypeError:
33
+ focal_length = param['meta'].item()['focal_length']
34
+ camera_center = param['meta'].item()['princpt']
35
+ try:
36
+ R = param['meta'].item()['R'][idx]
37
+ T = param['meta'].item()['T'][idx]
38
+ except KeyError:
39
+ R = None
40
+ T = None
41
+ except IndexError:
42
+ R = None
43
+ T = None
44
+
45
+ focal_length = np.asarray(focal_length).reshape(-1)
46
+ camera_center = np.asarray(camera_center).reshape(-1)
47
+
48
+ if len(focal_length)==1:
49
+ focal_length = [focal_length, focal_length]
50
+ if len(camera_center)==1:
51
+ camera_center = [camera_center, camera_center]
52
+
53
+ return focal_length, camera_center, R, T
54
+
55
+
56
+ def render_pose(img, body_model_param, body_model, camera, return_mask=False,
57
+ R=None, T=None):
58
+
59
+ # the inverse is same
60
+ pyrender2opencv = np.array([[1.0, 0, 0, 0],
61
+ [0, -1, 0, 0],
62
+ [0, 0, -1, 0],
63
+ [0, 0, 0, 1]])
64
+
65
+ output = body_model(**body_model_param, return_verts=True)
66
+ faces = body_model.faces
67
+
68
+ vertices = output['vertices'].detach().cpu().numpy().squeeze()
69
+
70
+ # adjust vertices beased on R and T
71
+ if R is not None:
72
+ joints = output['joints'].detach().cpu().numpy().squeeze()
73
+ root_joints = joints[0]
74
+ verts_T = np.dot(np.array(R), root_joints) - root_joints + np.array(T)
75
+ vertices = vertices + verts_T
76
+
77
+ # render material
78
+ base_color = (1.0, 193/255, 193/255, 1.0)
79
+ material = pyrender.MetallicRoughnessMaterial(
80
+ metallicFactor=0.3,
81
+ alphaMode='OPAQUE',
82
+ baseColorFactor=base_color)
83
+
84
+ # transfer to trimesh
85
+ body_trimesh = trimesh.Trimesh(vertices, faces, process=False)
86
+ body_mesh = pyrender.Mesh.from_trimesh(body_trimesh, material=material)
87
+
88
+ # prepare camera and light
89
+ light = pyrender.DirectionalLight(color=np.ones(3), intensity=2.0)
90
+ cam_pose = pyrender2opencv @ np.eye(4)
91
+
92
+ # build scene
93
+ scene = pyrender.Scene(bg_color=[0.0, 0.0, 0.0, 0.0],
94
+ ambient_light=(0.3, 0.3, 0.3))
95
+ scene.add(camera, pose=cam_pose)
96
+ scene.add(light, pose=cam_pose)
97
+ scene.add(body_mesh, 'mesh')
98
+
99
+ # render scene
100
+ # os.environ["PYOPENGL_PLATFORM"] = "osmesa" # include this line if use in vscode
101
+ r = pyrender.OffscreenRenderer(viewport_width=img.shape[1],
102
+ viewport_height=img.shape[0],
103
+ point_size=1.0)
104
+
105
+ #
106
+ color, _ = r.render(scene, flags=pyrender.RenderFlags.RGBA)
107
+ # depth = r.render(scene, flags=pyrender.RenderFlags.DEPTH_ONLY)
108
+ # normal, _ = r.render(scene, flags=pyrender.RenderFlags.FACE_NORMALS)
109
+
110
+ color = color.astype(np.float32) / 255.0
111
+ # depth = np.asarray(depth, dtype=np.float32)
112
+ # normal = np.asarray(normal, dtype=np.float32)
113
+
114
+ # set transparency in [0.0, 1.0]
115
+ alpha = 0.8
116
+ valid_mask = (color[:, :, -1] > 0)[:, :, np.newaxis]
117
+ valid_mask = valid_mask * alpha
118
+
119
+ img = img / 255
120
+ output_img = (color[:, :, :] * valid_mask + (1 - valid_mask) * img)
121
+
122
+ img = (output_img * 255).astype(np.uint8)
123
+ img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
124
+
125
+ if return_mask:
126
+ return img, valid_mask, (color * 255).astype(np.uint8)
127
+
128
+ return img
129
+
130
+
131
+ def visualize_humandata(args):
132
+
133
+ '''
134
+ '''
135
+
136
+ # TODO: load from args.path
137
+ param = dict(np.load(args.hd_path, allow_pickle=True))
138
+
139
+ # check for annot and type
140
+ has_smplx, has_smpl, has_gender = False, False, False
141
+ if 'smpl' in param.keys():
142
+ has_smpl = True
143
+ elif 'smplx' in param.keys():
144
+ has_smplx = True
145
+ if 'meta' in param.keys():
146
+ if 'gender' in param['meta'].item().keys():
147
+ has_gender = True
148
+ assert has_smpl or has_smplx, 'No body model annotation found in the dataset'
149
+
150
+ # load params
151
+ if has_smpl:
152
+ body_model_param_smpl = param['smpl'].item()
153
+ if has_smplx:
154
+ body_model_param_smplx = param['smplx'].item()
155
+
156
+ # read smplx only if has both smpl and smplx
157
+ if has_smpl and has_smplx:
158
+ has_smpl = False
159
+
160
+ device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
161
+
162
+ flat_hand_mean = args.flat_hand_mean
163
+ if 'misc' in param.keys():
164
+ if 'flat_hand_mean' in param['misc'].item().keys():
165
+ flat_hand_mean = param['misc'].item()['flat_hand_mean']
166
+
167
+
168
+ # build smpl model TODO: args for model path
169
+ gendered_smpl = {}
170
+ for gender in ['male', 'female', 'neutral']:
171
+ kwargs_smpl = dict(
172
+ gender=gender,
173
+ num_betas=10,
174
+ use_face_contour=True,
175
+ use_pca=False,
176
+ batch_size=1)
177
+ gendered_smpl[gender] = smplx.create(
178
+ args.body_model_path, 'smpl',
179
+ **kwargs_smpl).to(device)
180
+
181
+ # build smplx model TODO: model path
182
+ gendered_smplx = {}
183
+ for gender in ['male', 'female', 'neutral']:
184
+ kwargs_smplx = dict(
185
+ gender=gender,
186
+ num_betas=10,
187
+ use_face_contour=True,
188
+ flat_hand_mean=flat_hand_mean,
189
+ use_pca=False,
190
+ batch_size=1)
191
+ gendered_smplx[gender] = smplx.create(
192
+ args.body_model_path, 'smplx',
193
+ **kwargs_smplx).to(device)
194
+
195
+ # for idx in idx_list:
196
+ sample_size = args.render_num
197
+ if sample_size > len(param['image_path']):
198
+ idxs = range(len(param['image_path']))
199
+ else:
200
+ idxs = random.sample(range(len(param['image_path'])), sample_size)
201
+
202
+ for idx in tqdm(sorted(idxs), desc=f'Processing npz {os.path.basename(args.hd_path)}, sample size: {sample_size}',
203
+ position=0, leave=False):
204
+
205
+ # Load image
206
+ image_p = param['image_path'][idx]
207
+ image_path = os.path.join(args.image_folder, image_p)
208
+
209
+ image = cv2.imread(image_path)
210
+ image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
211
+
212
+ # ---------------------- render single pose ------------------------
213
+ # read cam params
214
+ focal_length, camera_center, R, T = get_cam_params(param, idx)
215
+
216
+ # read gender
217
+ if has_gender:
218
+ try:
219
+ gender = param['meta'].item()['gender'][idx]
220
+ except IndexError:
221
+ gender = 'neutral'
222
+ else:
223
+ gender = 'neutral'
224
+
225
+ # prepare for mesh projection
226
+ camera = pyrender.camera.IntrinsicsCamera(
227
+ fx=focal_length[0], fy=focal_length[1],
228
+ cx=camera_center[0], cy=camera_center[1])
229
+
230
+ if has_smpl:
231
+ intersect_key = list(set(body_model_param_smpl.keys()) & set(smpl_shape.keys()))
232
+ body_model_param_tensor = {key: torch.tensor(
233
+ np.array(body_model_param_smpl[key][idx:idx+1]).reshape(smpl_shape[key]),
234
+ device=device, dtype=torch.float32)
235
+ for key in intersect_key
236
+ if len(body_model_param_smpl[key][idx:idx+1]) > 0}
237
+
238
+ rendered_image = render_pose(img=image,
239
+ body_model_param=body_model_param_tensor,
240
+ body_model=gendered_smpl[gender],
241
+ camera=camera,
242
+ R=R, T=T)
243
+ if has_smplx:
244
+ intersect_key = list(set(body_model_param_smplx.keys()) & set(smplx_shape.keys()))
245
+ body_model_param_tensor = {key: torch.tensor(
246
+ np.array(body_model_param_smplx[key][idx:idx+1]).reshape(smplx_shape[key]),
247
+ device=device, dtype=torch.float32)
248
+ for key in intersect_key
249
+ if len(body_model_param_smplx[key][idx:idx+1]) > 0}
250
+
251
+ rendered_image = render_pose(img=image,
252
+ body_model_param=body_model_param_tensor,
253
+ body_model=gendered_smplx[gender],
254
+ camera=camera,
255
+ R=R, T=T)
256
+
257
+ # ---------------------- render results ----------------------
258
+ os.makedirs(args.output_folder, exist_ok=True)
259
+
260
+ # save image
261
+ out_image_path = os.path.join(args.output_folder,
262
+ f'{os.path.basename(args.hd_path)[:-4]}_{idx}.png')
263
+ # print(f'Saving image to {out_image_path}')
264
+ cv2.imwrite(out_image_path, rendered_image)
265
+
266
+
267
+ if __name__ == '__main__':
268
+
269
+ parser = argparse.ArgumentParser()
270
+ # path args
271
+ parser.add_argument('--hd_path', type=str, required=False,
272
+ help='path to humandata npz file',
273
+ default='/mnt/d/test_area/hd_sample_SMPLestX/hd_10sample.npz')
274
+ parser.add_argument('--image_folder', type=str, required=False,
275
+ help='path to the image base folder',
276
+ default='/mnt/d/test_area/hd_sample_SMPLestX')
277
+ parser.add_argument('--output_folder', type=str, required=False,
278
+ help='path to folder that writes the rendered image',
279
+ default='/mnt/d/test_area/hd_sample_SMPLestX/output')
280
+ # TODO: add default bm path
281
+ parser.add_argument('--body_model_path', type=str, required=False,
282
+ help='path to smpl/smplx models folder, if you follow repo file structure, \
283
+ no need to specify',
284
+ default='/home/weichen/wc_workspace/models/human_model')
285
+
286
+ # render args
287
+ parser.add_argument('--flat_hand_mean', type=bool, required=False,
288
+ help='use flat hand mean for smplx, will try to load from humandata["misc"] \
289
+ if not find, will use value from args',
290
+ default=False)
291
+ parser.add_argument('--render_num', type=int, required=False,
292
+ help='Randomly senect how many instances to render',
293
+ default='10')
294
+
295
+ args = parser.parse_args()
296
+
297
+ visualize_humandata(args)
298
+
SMPLest-X/main/__init__.py ADDED
File without changes
SMPLest-X/main/base.py ADDED
@@ -0,0 +1,234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os.path as osp
2
+ import math
3
+ import abc
4
+ from torch.utils.data import DataLoader
5
+ from torch.nn.parallel.data_parallel import DataParallel
6
+ import torch.optim
7
+ import torchvision.transforms as transforms
8
+ from utils.timer import Timer
9
+ from utils.logger import colorlogger
10
+ from datasets.dataset import MultipleDatasets
11
+ import importlib
12
+ from models.SMPLest_X import get_model
13
+
14
+ # ddp
15
+ import torch.cuda
16
+ import torch.distributed as dist
17
+ from torch.utils.data import DistributedSampler
18
+ import torch.utils.data.distributed
19
+ from utils.distribute_utils import (
20
+ get_rank, is_main_process, time_synchronized, get_group_idx, get_process_groups, get_dist_info
21
+ )
22
+
23
+ def dynamic_import(module_name, object_name):
24
+ """Dynamically import a module and access a specific object."""
25
+ module = importlib.import_module(module_name)
26
+ return getattr(module, object_name)
27
+
28
+
29
+ class Base(object):
30
+ __metaclass__ = abc.ABCMeta
31
+
32
+ def __init__(self, cfg, log_name='logs.txt'):
33
+ self.cur_epoch = 0
34
+
35
+ # timer
36
+ self.tot_timer = Timer()
37
+ self.gpu_timer = Timer()
38
+ self.read_timer = Timer()
39
+
40
+ # logger
41
+ self.logger = colorlogger(cfg.log.log_dir, log_name=log_name)
42
+
43
+ @abc.abstractmethod
44
+ def _make_batch_generator(self):
45
+ return
46
+
47
+ @abc.abstractmethod
48
+ def _make_model(self):
49
+ return
50
+
51
+
52
+ class Trainer(Base):
53
+ def __init__(self, cfg, distributed=False, gpu_idx=None):
54
+ super(Trainer, self).__init__(cfg, log_name='train_logs.txt')
55
+ self.distributed = distributed
56
+ self.gpu_idx = gpu_idx
57
+ self.cfg = cfg
58
+
59
+ def get_optimizer(self, model):
60
+ normal_param = []
61
+
62
+ for module in model.module.trainable_modules:
63
+ normal_param += list(module.parameters())
64
+ optim_params = [
65
+ {
66
+ 'params': normal_param,
67
+ 'lr': self.cfg.train.lr
68
+ }
69
+ ]
70
+ optimizer = torch.optim.Adam(optim_params, lr=self.cfg.train.lr)
71
+ return optimizer
72
+
73
+ def save_model(self, state, epoch):
74
+ file_path = osp.join(self.cfg.log.model_dir, f'snapshot_{str(epoch)}.pth.tar')
75
+
76
+ # do not save smplx layer weights
77
+ dump_key = []
78
+ for k in state['network'].keys():
79
+ if 'smplx_layer' in k:
80
+ dump_key.append(k)
81
+ for k in dump_key:
82
+ state['network'].pop(k, None)
83
+
84
+ torch.save(state, file_path)
85
+ self.logger.info(f"Write snapshot into {file_path}")
86
+
87
+ def load_model(self, model, optimizer):
88
+ if self.cfg.model.pretrained_model_path is not None:
89
+ ckpt_path = self.cfg.model.pretrained_model_path
90
+ ckpt = torch.load(ckpt_path, map_location=torch.device('cpu'), weights_only=False) # solve CUDA OOM error in DDP
91
+ model.load_state_dict(ckpt['network'], strict=False)
92
+ model.cuda()
93
+ self.logger.info(f'Load checkpoint from {ckpt_path}')
94
+ torch.cuda.empty_cache()
95
+ if getattr(self.cfg.train, 'start_over', True):
96
+ start_epoch = 0
97
+ else:
98
+ optimizer.load_state_dict(ckpt['optimizer'])
99
+ start_epoch = ckpt['epoch'] + 1
100
+ self.logger.info(f'Load optimizer, start from {start_epoch}')
101
+ else:
102
+ start_epoch = 0
103
+
104
+ return start_epoch, model, optimizer
105
+
106
+ def get_lr(self):
107
+ for g in self.optimizer.param_groups:
108
+ cur_lr = g['lr']
109
+ return cur_lr
110
+
111
+ def _make_batch_generator(self):
112
+ # data load and construct batch generator
113
+ self.logger_info("Creating dataset...")
114
+ trainset_humandata_loader = []
115
+ for humandata_dataset in self.cfg.data.trainset_humandata:
116
+ trainset_humandata_loader.append(dynamic_import(
117
+ f"datasets.{humandata_dataset}", humandata_dataset)(transforms.ToTensor(), "train", self.cfg))
118
+
119
+ data_strategy = getattr(self.cfg.data, 'data_strategy', 'balance')
120
+ if data_strategy == 'concat':
121
+ print("Using [concat] strategy...")
122
+ trainset_loader = MultipleDatasets(trainset_humandata_loader,
123
+ make_same_len=False, verbose=True)
124
+ elif data_strategy == 'balance':
125
+ total_len = getattr(self.cfg.data, 'total_data_len', 'auto')
126
+ print(f"Using [balance] strategy with total_data_len : {total_len}...")
127
+ trainset_loader = MultipleDatasets(trainset_humandata_loader,
128
+ make_same_len=True, total_len=total_len, verbose=True)
129
+
130
+ self.itr_per_epoch = math.ceil(
131
+ len(trainset_loader) / self.cfg.train.num_gpus / self.cfg.train.train_batch_size)
132
+
133
+ if self.distributed:
134
+ self.logger_info(f"Total data length {len(trainset_loader)}.")
135
+ rank, world_size = get_dist_info()
136
+ self.logger_info("Using distributed data sampler.")
137
+
138
+ sampler_train = DistributedSampler(trainset_loader, world_size, rank, shuffle=True)
139
+ self.batch_generator = DataLoader(dataset=trainset_loader, batch_size=self.cfg.train.train_batch_size,
140
+ shuffle=False, num_workers=self.cfg.train.num_thread, sampler=sampler_train,
141
+ pin_memory=True, persistent_workers=True if self.cfg.train.num_thread > 0 else False,
142
+ drop_last=True)
143
+ else:
144
+ self.batch_generator = DataLoader(dataset=trainset_loader,
145
+ batch_size=self.cfg.train.num_gpus * self.cfg.train.train_batch_size,
146
+ shuffle=True, num_workers=self.cfg.train.num_thread,
147
+ pin_memory=True, drop_last=True)
148
+
149
+ def _make_model(self):
150
+ # prepare network
151
+ self.logger_info("Creating graph and optimizer...")
152
+ model = get_model(self.cfg, 'train')
153
+
154
+ if self.distributed:
155
+ self.logger_info("Using distributed data parallel.")
156
+ model.cuda()
157
+ model = torch.nn.parallel.DistributedDataParallel(
158
+ model, device_ids=[self.gpu_idx],
159
+ find_unused_parameters=True)
160
+ else:
161
+ model = DataParallel(model).cuda()
162
+
163
+ optimizer = self.get_optimizer(model)
164
+ scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,
165
+ self.cfg.train.end_epoch * self.itr_per_epoch,
166
+ eta_min=getattr(self.cfg.train,'min_lr',1e-6))
167
+
168
+ if self.cfg.train.continue_train:
169
+ start_epoch, model, optimizer = self.load_model(model, optimizer)
170
+ else:
171
+ start_epoch = 0
172
+ model.train()
173
+
174
+ self.scheduler = scheduler
175
+ self.start_epoch = start_epoch
176
+ self.model = model
177
+ self.optimizer = optimizer
178
+
179
+ def logger_info(self, info):
180
+ if self.distributed:
181
+ if is_main_process():
182
+ self.logger.info(info)
183
+ else:
184
+ self.logger.info(info)
185
+
186
+
187
+ class Tester(Base):
188
+ def __init__(self, cfg):
189
+ super(Tester, self).__init__(cfg, log_name='test_logs.txt')
190
+
191
+ self.cfg = cfg
192
+
193
+ def _make_batch_generator(self):
194
+ # data load and construct batch generator
195
+ self.logger.info("Creating dataset...")
196
+ testset_loader = dynamic_import(
197
+ f"datasets.{self.cfg.data.testset}", self.cfg.data.testset)(transforms.ToTensor(), "test", self.cfg)
198
+ batch_generator = DataLoader(dataset=testset_loader, batch_size=self.cfg.test.test_batch_size,
199
+ shuffle=False, num_workers=1, pin_memory=True)
200
+
201
+ self.testset = testset_loader
202
+ self.batch_generator = batch_generator
203
+
204
+ def _make_model(self):
205
+ self.logger.info('Load checkpoint from {}'.format(self.cfg.model.pretrained_model_path))
206
+
207
+ # prepare network
208
+ self.logger.info("Creating graph...")
209
+ model = get_model(self.cfg, 'test')
210
+ model = DataParallel(model).cuda()
211
+
212
+ ckpt = torch.load(self.cfg.model.pretrained_model_path, map_location=torch.device('cpu'), weights_only=False)
213
+
214
+ from collections import OrderedDict
215
+ new_state_dict = OrderedDict()
216
+ for k, v in ckpt['network'].items():
217
+ if 'module' not in k:
218
+ k = 'module.' + k
219
+ k = k.replace('backbone', 'encoder').replace('body_rotation_net', 'body_regressor').replace(
220
+ 'hand_rotation_net', 'hand_regressor')
221
+ new_state_dict[k] = v
222
+ self.logger.warning("Attention: Strict=False is set for checkpoint loading. Please check manually.")
223
+ model.load_state_dict(new_state_dict, strict=False)
224
+ model.cuda()
225
+ model.eval()
226
+
227
+ self.model = model
228
+
229
+ def _evaluate(self, outs, cur_sample_idx):
230
+ eval_result = self.testset.evaluate(outs, cur_sample_idx)
231
+ return eval_result
232
+
233
+ def _print_eval_result(self, eval_result):
234
+ self.testset.print_eval_result(eval_result)
SMPLest-X/main/config.py ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import importlib.util
3
+ from pathlib import Path
4
+ import json
5
+
6
+
7
+ class Config(dict):
8
+ """A dictionary that allows dot notation access for configuration settings."""
9
+ def __init__(self, data=None):
10
+ super().__init__()
11
+ if data:
12
+ for key, value in data.items():
13
+ # Set the key-value pair using the key and the converted value
14
+ self[key] = self._convert(value)
15
+
16
+ def _convert(self, value):
17
+ """Recursively convert nested dictionaries to Config."""
18
+ if isinstance(value, dict):
19
+ return Config(value) # Convert all nested dicts to Config
20
+ elif isinstance(value, list):
21
+ return [self._convert(item) for item in value] # Convert items in lists
22
+ elif isinstance(value, Path):
23
+ return str(value) # Convert Path objects to string
24
+ return value
25
+
26
+ def __getattr__(self, item):
27
+ """Allow access to dictionary keys via dot notation."""
28
+ if item in self:
29
+ return self[item]
30
+ raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{item}'")
31
+
32
+ def __setattr__(self, key, value):
33
+ """Allow setting dictionary keys via dot notation."""
34
+ self[key] = self._convert(value)
35
+
36
+ @classmethod
37
+ def load_config(cls, file_path):
38
+ """Load a Python config file and return it as a Config instance."""
39
+ spec = importlib.util.spec_from_file_location("config_module", file_path)
40
+ config_module = importlib.util.module_from_spec(spec)
41
+ spec.loader.exec_module(config_module)
42
+
43
+ if hasattr(config_module, "config") and isinstance(config_module.config, dict):
44
+ return cls(config_module.config) # Ensure full conversion of nested dicts
45
+ else:
46
+ raise ValueError("The config file does not define a 'config' dictionary.")
47
+
48
+ def update_config(self, new_data):
49
+ """Recursively update Config with new dictionary values."""
50
+ for key, value in new_data.items():
51
+ if isinstance(value, dict) and isinstance(self.get(key), Config):
52
+ self[key].update_config(value) # Recursive update for nested dicts
53
+ else:
54
+ self[key] = self._convert(value) # Convert and assign
55
+
56
+ def dump_config(self, file_path=None):
57
+ """Dump the Config object into a .py file with a Pythonic and readable format."""
58
+ # Ensure the provided path is valid
59
+ if file_path is None:
60
+ file_path = self.log.output_dir + '/config.py'
61
+ else:
62
+ dir_name = os.path.dirname(file_path)
63
+ if dir_name and not os.path.exists(dir_name):
64
+ os.makedirs(dir_name)
65
+
66
+ # Convert the Config instance into a regular dictionary
67
+ def config_to_dict(config):
68
+ """Recursively convert a Config instance into a regular dictionary."""
69
+ if isinstance(config, Config):
70
+ return {key: config_to_dict(value) if isinstance(value, Config) else value
71
+ for key, value in config.items()}
72
+ return config
73
+
74
+ config_dict = config_to_dict(self)
75
+
76
+ # Write the config dictionary to a .py file in a formatted, readable way
77
+ with open(file_path, 'w') as f:
78
+ # Use json.dumps to pretty-print the dictionary with indentation and spaces
79
+ f.write("config = ")
80
+ f.write(json.dumps(config_dict, indent=4)) # Pretty print with indentation
81
+ f.write("\n")
82
+
83
+ print(f"Config has been saved to {file_path}")
84
+
85
+ def prepare_log(self):
86
+
87
+ def make_folder(folder):
88
+ if not os.path.exists(folder):
89
+ os.makedirs(folder)
90
+
91
+ if self.log.output_dir is not None:
92
+ make_folder(self.log.output_dir)
93
+ if self.log.model_dir is not None:
94
+ make_folder(self.log.model_dir)
95
+ if self.log.log_dir is not None:
96
+ make_folder(self.log.log_dir)
97
+ if self.log.result_dir is not None:
98
+ make_folder(self.log.result_dir)
99
+
100
+
101
+
SMPLest-X/main/constants.py ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+
3
+ # keypoints3d_cam with root-align has higher priority, followed by old version key keypoints3d
4
+ # when there is keypoints3d_smplx, use this rather than keypoints3d_original
5
+ KPS2D_KEYS = ['keypoints2d', 'keypoints2d_smplx', 'keypoints2d_smpl', 'keypoints2d_original']
6
+ KPS3D_KEYS = ['keypoints3d_cam', 'keypoints3d', 'keypoints3d_smplx','keypoints3d_smpl' ,'keypoints3d_original']
7
+
8
+ HANDS_MEAN_R = np.array([ 0.11167871, -0.04289218, 0.41644183, 0.10881133, 0.06598568,
9
+ 0.75622 , -0.09639297, 0.09091566, 0.18845929, -0.11809504,
10
+ -0.05094385, 0.5295845 , -0.14369841, -0.0552417 , 0.7048571 ,
11
+ -0.01918292, 0.09233685, 0.3379135 , -0.45703298, 0.19628395,
12
+ 0.6254575 , -0.21465237, 0.06599829, 0.50689423, -0.36972436,
13
+ 0.06034463, 0.07949023, -0.1418697 , 0.08585263, 0.63552827,
14
+ -0.3033416 , 0.05788098, 0.6313892 , -0.17612089, 0.13209307,
15
+ 0.37335458, 0.8509643 , -0.27692273, 0.09154807, -0.49983943,
16
+ -0.02655647, -0.05288088, 0.5355592 , -0.04596104, 0.27735803]).reshape(15, -1)
17
+ HANDS_MEAN_L = np.array([ 0.11167871, 0.04289218, -0.41644183, 0.10881133, -0.06598568,
18
+ -0.75622 , -0.09639297, -0.09091566, -0.18845929, -0.11809504,
19
+ 0.05094385, -0.5295845 , -0.14369841, 0.0552417 , -0.7048571 ,
20
+ -0.01918292, -0.09233685, -0.3379135 , -0.45703298, -0.19628395,
21
+ -0.6254575 , -0.21465237, -0.06599829, -0.50689423, -0.36972436,
22
+ -0.06034463, -0.07949023, -0.1418697 , -0.08585263, -0.63552827,
23
+ -0.3033416 , -0.05788098, -0.6313892 , -0.17612089, -0.13209307,
24
+ -0.37335458, 0.8509643 , 0.27692273, -0.09154807, -0.49983943,
25
+ 0.02655647, 0.05288088, 0.5355592 , 0.04596104, -0.27735803]).reshape(15, -1)
26
+
27
+ # same mapping for 144->137 and 190->137
28
+ SMPLX_137_MAPPING = [
29
+ 0, 1, 2, 4, 5, 7, 8, 12, 16, 17, 18, 19, 20, 21, 60, 61, 62, 63, 64, 65, 59, 58, 57, 56, 55, 37, 38, 39, 66,
30
+ 25, 26, 27, 67, 28, 29, 30, 68, 34, 35, 36, 69, 31, 32, 33, 70, 52, 53, 54, 71, 40, 41, 42, 72, 43, 44, 45,
31
+ 73, 49, 50, 51, 74, 46, 47, 48, 75, 22, 15, 56, 57, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
32
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113,
33
+ 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135,
34
+ 136, 137, 138, 139, 140, 141, 142, 143]
35
+
36
+ # smplx to lsp body keypoints mapping
37
+ LSP_MAPPIMG = [1, 2, 4, 5, 7, 8, 12, 15, 16, 17, 18, 19, 20, 21]
SMPLest-X/main/inference.py ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
4
+ sys.path.insert(0, PROJECT_ROOT)
5
+ import os.path as osp
6
+ import argparse
7
+ import numpy as np
8
+ import torchvision.transforms as transforms
9
+ import torch.backends.cudnn as cudnn
10
+ import torch
11
+ import cv2
12
+ import datetime
13
+ from tqdm import tqdm
14
+ from pathlib import Path
15
+ from human_models.human_models import SMPLX
16
+ from ultralytics import YOLO
17
+ from main.base import Tester
18
+ from main.config import Config
19
+ from utils.data_utils import load_img, process_bbox, generate_patch_image
20
+ # from utils.visualization_utils import render_mesh
21
+ from utils.inference_utils import non_max_suppression
22
+ import pickle
23
+
24
+ def parse_args():
25
+ parser = argparse.ArgumentParser()
26
+ parser.add_argument('--num_gpus', type=int, dest='num_gpus')
27
+ parser.add_argument('--file_name', type=str, default='test')
28
+ parser.add_argument('--ckpt_name', type=str, default='smplest_x_h')
29
+ parser.add_argument('--ckpt_path', type=str, default='model_dump')
30
+ parser.add_argument('--start', type=str, default=1)
31
+ parser.add_argument('--end', type=str, default=1)
32
+ parser.add_argument('--multi_person', action='store_true')
33
+ args = parser.parse_args()
34
+ return args
35
+
36
+ def main():
37
+ args = parse_args()
38
+ cudnn.benchmark = True
39
+
40
+ # init config
41
+ time_str = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
42
+ root_dir = Path(__file__).resolve().parent.parent
43
+ config_path = osp.join(args.ckpt_path, 'config_base.py')
44
+ cfg = Config.load_config(config_path)
45
+ img_folder = osp.join(root_dir, 'demo', 'input_frames', args.file_name)
46
+ # output_folder = osp.join(root_dir, 'demo', 'output_pkls', args.file_name)
47
+ output_pkls_folder = osp.join(root_dir, 'demo', 'output_pkls')
48
+ # os.makedirs(output_folder, exist_ok=True)
49
+ exp_name = f'inference_{args.file_name}_{args.ckpt_name}_{time_str}'
50
+
51
+ new_config = {
52
+ "log":{
53
+ 'exp_name': exp_name,
54
+ 'log_dir': osp.join(root_dir, 'outputs', exp_name, 'log'),
55
+ }
56
+ }
57
+ cfg.update_config(new_config)
58
+ cfg.prepare_log()
59
+
60
+ # init human models
61
+ smpl_x = SMPLX(cfg.model.human_model_path)
62
+
63
+ # init tester
64
+ demoer = Tester(cfg)
65
+ demoer.logger.info(f"Using 1 GPU.")
66
+ demoer.logger.info(f'Inference [{args.file_name}] with [{cfg.model.pretrained_model_path}].')
67
+ demoer._make_model()
68
+
69
+ # init detector
70
+ bbox_model = getattr(cfg.inference.detection, "model_path",
71
+ '/mnt/shared-storage-user/mllm/zangyuhang/pmx/pretrained_weight/yolo/yolo26l.pt')
72
+ detector = YOLO(bbox_model)
73
+
74
+ start = int(args.start)
75
+ end = int(args.end) + 1
76
+
77
+ results = []
78
+ for frame in tqdm(range(start, end)):
79
+
80
+ # prepare input image
81
+ img_path =osp.join(img_folder, f'{int(frame):06d}.jpg')
82
+
83
+ transform = transforms.ToTensor()
84
+ original_img = load_img(img_path)
85
+ vis_img = original_img.copy()
86
+ original_img_height, original_img_width = original_img.shape[:2]
87
+
88
+ # detection, xyxy
89
+ yolo_bbox = detector.predict(original_img,
90
+ device='cuda',
91
+ classes=00,
92
+ conf=cfg.inference.detection.conf,
93
+ save=cfg.inference.detection.save,
94
+ verbose=cfg.inference.detection.verbose
95
+ )[0].boxes.xyxy.detach().cpu().numpy()
96
+
97
+ if len(yolo_bbox)<1:
98
+ # save original image if no bbox
99
+ num_bbox = 0
100
+ elif not args.multi_person:
101
+ # only select the largest bbox
102
+ num_bbox = 1
103
+ # yolo_bbox = yolo_bbox[0]
104
+ else:
105
+ # keep bbox by NMS with iou_thr
106
+ yolo_bbox = non_max_suppression(yolo_bbox, cfg.inference.detection.iou_thr)
107
+ num_bbox = len(yolo_bbox)
108
+
109
+ # loop all detected bboxes
110
+ for bbox_id in range(num_bbox):
111
+ yolo_bbox_xywh = np.zeros((4))
112
+ yolo_bbox_xywh[0] = yolo_bbox[bbox_id][0]
113
+ yolo_bbox_xywh[1] = yolo_bbox[bbox_id][1]
114
+ yolo_bbox_xywh[2] = abs(yolo_bbox[bbox_id][2] - yolo_bbox[bbox_id][0])
115
+ yolo_bbox_xywh[3] = abs(yolo_bbox[bbox_id][3] - yolo_bbox[bbox_id][1])
116
+
117
+ # xywh
118
+ bbox = process_bbox(bbox=yolo_bbox_xywh,
119
+ img_width=original_img_width,
120
+ img_height=original_img_height,
121
+ input_img_shape=cfg.model.input_img_shape,
122
+ ratio=getattr(cfg.data, "bbox_ratio", 1.25))
123
+ img, _, _ = generate_patch_image(cvimg=original_img,
124
+ bbox=bbox,
125
+ scale=1.0,
126
+ rot=0.0,
127
+ do_flip=False,
128
+ out_shape=cfg.model.input_img_shape)
129
+
130
+ img = transform(img.astype(np.float32))/255
131
+ img = img.cuda()[None,:,:,:]
132
+ inputs = {'img': img}
133
+ targets = {}
134
+ meta_info = {}
135
+
136
+ # mesh recovery
137
+ with torch.no_grad():
138
+ out = demoer.model(inputs, targets, meta_info, 'test')
139
+
140
+ mesh = out['smplx_mesh_cam'].detach().cpu().numpy()[0]
141
+
142
+ result = {
143
+ "frame_id": frame,
144
+ "bbox_xyxy": yolo_bbox[bbox_id].copy(),
145
+ "smplx_mesh_cam": out['smplx_mesh_cam'].detach().cpu().numpy()[0],
146
+ }
147
+
148
+ # 可选(如果存在)
149
+ for k in [
150
+ 'smplx_joint_cam',
151
+ 'smplx_pose',
152
+ 'smplx_shape',
153
+ 'smplx_expr'
154
+ ]:
155
+ if k in out:
156
+ result[k] = out[k].detach().cpu().numpy()[0]
157
+
158
+ results.append(result)
159
+
160
+ # render mesh
161
+ # focal = [cfg.model.focal[0] / cfg.model.input_body_shape[1] * bbox[2],
162
+ # cfg.model.focal[1] / cfg.model.input_body_shape[0] * bbox[3]]
163
+ # princpt = [cfg.model.princpt[0] / cfg.model.input_body_shape[1] * bbox[2] + bbox[0],
164
+ # cfg.model.princpt[1] / cfg.model.input_body_shape[0] * bbox[3] + bbox[1]]
165
+
166
+ # draw the bbox on img
167
+ # vis_img = cv2.rectangle(vis_img, (int(yolo_bbox[bbox_id][0]), int(yolo_bbox[bbox_id][1])),
168
+ # (int(yolo_bbox[bbox_id][2]), int(yolo_bbox[bbox_id][3])), (0, 255, 0), 1)
169
+ # draw mesh
170
+ # vis_img = None #render_mesh(vis_img, mesh, smpl_x.face, {'focal': focal, 'princpt': princpt}, mesh_as_vertices=False)
171
+
172
+ # save rendered image
173
+ # frame_name = os.path.basename(img_path)
174
+ # cv2.imwrite(os.path.join(output_folder, frame_name), vis_img[:, :, ::-1])
175
+
176
+ # save as pkl
177
+ pkl_path = os.path.join(
178
+ output_pkls_folder,
179
+ f'{args.file_name}.pkl'
180
+ )
181
+
182
+ with open(pkl_path, 'wb') as f:
183
+ pickle.dump(results, f)
184
+
185
+ print(f"✅ Saved results to {pkl_path}")
186
+
187
+ if __name__ == "__main__":
188
+ main()
SMPLest-X/main/test.py ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import torch
3
+ import torch.backends.cudnn as cudnn
4
+ from main.config import Config
5
+ import os.path as osp
6
+ import datetime
7
+ from pathlib import Path
8
+ from main.base import Tester
9
+ from human_models.human_models import SMPL, SMPLX
10
+ from tqdm import tqdm
11
+
12
+ def parse_args():
13
+ parser = argparse.ArgumentParser()
14
+ parser.add_argument('--num_gpus', type=int, dest='num_gpus')
15
+ parser.add_argument('--exp_name', type=str, default='output/test')
16
+ parser.add_argument('--result_path', type=str, default='output/test')
17
+ parser.add_argument('--ckpt_idx', type=int, default=0)
18
+ parser.add_argument('--test_batch_size', type=int, default=64)
19
+ parser.add_argument('--testset', type=str, default='EHF')
20
+ parser.add_argument('--use_cache', action='store_true')
21
+ args = parser.parse_args()
22
+ return args
23
+
24
+ def main():
25
+ args = parse_args()
26
+ cudnn.benchmark = True
27
+
28
+ # init config
29
+ time_str = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
30
+ root_dir = Path(__file__).resolve().parent.parent
31
+ config_path = osp.join('./outputs',args.result_path, 'code', 'config_base.py')
32
+ cfg = Config.load_config(config_path)
33
+ checkpoint_path = osp.join('./outputs', args.result_path, 'model_dump', f'snapshot_{int(args.ckpt_idx)}.pth.tar')
34
+ exp_name = f'{args.exp_name}_ep{int(args.ckpt_idx)}_{time_str}'
35
+
36
+ if args.testset in ['AGORA_test', 'BEDLAM_test']:
37
+ print(f'Test on {args.testset} set...')
38
+
39
+ new_config = {
40
+ "data": {
41
+ "testset": str(args.testset),
42
+ "use_cache": args.use_cache,
43
+ },
44
+ "test":{
45
+ "test_batch_size": int(args.test_batch_size),
46
+ },
47
+ "model": {
48
+ "pretrained_model_path": checkpoint_path,
49
+ },
50
+ "log":{
51
+ 'exp_name': exp_name,
52
+ 'output_dir': osp.join(root_dir, 'outputs', exp_name),
53
+ 'model_dir': osp.join(root_dir, 'outputs', exp_name, 'model_dump'),
54
+ 'log_dir': osp.join(root_dir, 'outputs', exp_name, 'log'),
55
+ 'result_dir': osp.join(root_dir, 'outputs', exp_name, 'result'),
56
+ }
57
+ }
58
+
59
+ cfg.update_config(new_config)
60
+ cfg.prepare_log()
61
+ cfg.dump_config()
62
+
63
+ # init human models
64
+ smpl = SMPL(cfg.model.human_model_path)
65
+ smpl_x = SMPLX(cfg.model.human_model_path)
66
+
67
+ # init tester
68
+ tester = Tester(cfg)
69
+ tester.logger.info(f"Using 1 GPU with bs={cfg.test.test_batch_size} per GPU.")
70
+ tester.logger.info(f'Testing [{checkpoint_path}] on datasets [{cfg.data.testset}]')
71
+
72
+ tester._make_batch_generator()
73
+ tester._make_model()
74
+
75
+ eval_result = {}
76
+ cur_sample_idx = 0
77
+ for itr, (inputs, targets, meta_info) in enumerate(tqdm(tester.batch_generator)):
78
+
79
+ with torch.no_grad():
80
+ model_out = tester.model(inputs, targets, meta_info, 'test')
81
+
82
+ batch_size = model_out['img'].shape[0]
83
+
84
+ out = {}
85
+ for k, v in model_out.items():
86
+ if isinstance(v, torch.Tensor):
87
+ out[k] = v.cpu().numpy()
88
+ elif isinstance(v, list):
89
+ out[k] = v
90
+ else:
91
+ raise ValueError('Undefined type in out. Key: {}; Type: {}.'.format(k, type(v)))
92
+
93
+ out = [{k: v[bid] for k, v in out.items()} for bid in range(batch_size)]
94
+
95
+ # evaluate
96
+ cur_eval_result = tester._evaluate(out, cur_sample_idx)
97
+ for k, v in cur_eval_result.items():
98
+ if k in eval_result:
99
+ eval_result[k] += v
100
+ else:
101
+ eval_result[k] = v
102
+ cur_sample_idx += len(out)
103
+
104
+ tester._print_eval_result(eval_result)
105
+
106
+ if __name__ == "__main__":
107
+ main()
SMPLest-X/main/train.py ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import torch.backends.cudnn as cudnn
3
+ from main.config import Config
4
+ import os.path as osp
5
+ import os
6
+ import datetime
7
+ from pathlib import Path
8
+ import torch.distributed as dist
9
+ from utils.distribute_utils import init_distributed_mode, \
10
+ is_main_process, set_seed, get_dist_info
11
+ from main.base import Trainer
12
+ from human_models.human_models import SMPL, SMPLX
13
+
14
+ def parse_args():
15
+ parser = argparse.ArgumentParser()
16
+ parser.add_argument('--local_rank', type=int, dest='num_gpus')
17
+ parser.add_argument('--num_gpus', type=int, dest='num_gpus')
18
+ parser.add_argument('--master_port', type=int, dest='master_port')
19
+ parser.add_argument('--exp_name', type=str, default='output/test')
20
+ parser.add_argument('--config', type=str, default='./config/config_base.py')
21
+ args = parser.parse_args()
22
+
23
+ return args
24
+
25
+ def main():
26
+ args = parse_args()
27
+ set_seed(2023)
28
+ cudnn.benchmark = True
29
+
30
+ # process config
31
+ time_str = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
32
+ root_dir = Path(__file__).resolve().parent.parent
33
+ config_path = osp.join('./configs', args.config) # TODO: move config folder outsied main
34
+ cfg = Config.load_config(config_path)
35
+ new_config = {
36
+ "train": {
37
+ "num_gpus": int(args.num_gpus),
38
+ },
39
+ "log":{
40
+ 'exp_name': f'{args.exp_name}_{time_str}',
41
+ 'output_dir': osp.join(root_dir, 'outputs', f'{args.exp_name}_{time_str}'),
42
+ 'model_dir': osp.join(root_dir, 'outputs', f'{args.exp_name}_{time_str}', 'model_dump'),
43
+ 'log_dir': osp.join(root_dir, 'outputs', f'{args.exp_name}_{time_str}', 'log'),
44
+ 'result_dir': osp.join(root_dir, 'outputs', f'{args.exp_name}_{time_str}', 'result'),
45
+ }
46
+ }
47
+ cfg.update_config(new_config)
48
+ cfg.prepare_log()
49
+ cfg.dump_config()
50
+
51
+ # init ddp
52
+ distributed, gpu_idx = init_distributed_mode(args.master_port)
53
+
54
+ # init human models
55
+ smpl = SMPL(cfg.model.human_model_path)
56
+ smpl_x = SMPLX(cfg.model.human_model_path)
57
+
58
+ # init traininer
59
+ trainer = Trainer(cfg, distributed, gpu_idx)
60
+ trainer.logger_info(f"Using {cfg.train.num_gpus} GPUs with bs={cfg.train.train_batch_size} per GPU.")
61
+ trainer.logger_info(f'Training with datasets: {cfg.data.trainset_humandata}')
62
+
63
+ trainer._make_batch_generator()
64
+ trainer._make_model()
65
+
66
+ for epoch in range(trainer.start_epoch, cfg.train.end_epoch):
67
+ trainer.tot_timer.tic()
68
+ trainer.read_timer.tic()
69
+
70
+ # ddp, align random seed between devices
71
+ trainer.batch_generator.sampler.set_epoch(epoch)
72
+
73
+ for itr, (inputs, targets, meta_info) in enumerate(trainer.batch_generator):
74
+ trainer.read_timer.toc()
75
+ trainer.gpu_timer.tic()
76
+
77
+ # forward
78
+ trainer.optimizer.zero_grad()
79
+ loss= trainer.model(inputs, targets, meta_info, 'train')
80
+ loss_mean = {k: v.mean() for k, v in loss.items()}
81
+ loss_sum = sum(v for k, v in loss_mean.items())
82
+
83
+ # backward
84
+ loss_sum.backward()
85
+ trainer.optimizer.step()
86
+ trainer.scheduler.step()
87
+
88
+ trainer.gpu_timer.toc()
89
+
90
+ if (itr + 1) % cfg.train.print_iters == 0:
91
+ # loss of all ranks
92
+ rank, world_size = get_dist_info()
93
+ loss_print = loss_mean.copy()
94
+ for k in loss_print:
95
+ dist.all_reduce(loss_print[k])
96
+
97
+ total_loss = 0
98
+ for k in loss_print:
99
+ loss_print[k] = loss_print[k] / world_size
100
+ total_loss += loss_print[k]
101
+ loss_print['total'] = total_loss
102
+
103
+ screen = [
104
+ 'Epoch %d/%d itr %d/%d:' % (epoch, cfg.train.end_epoch, itr, trainer.itr_per_epoch),
105
+ 'lr: %g' % (trainer.get_lr()),
106
+ 'speed: %.2f(%.2fs r%.2f)s/itr' % (
107
+ trainer.tot_timer.average_time, trainer.gpu_timer.average_time,
108
+ trainer.read_timer.average_time),
109
+ '%.2fh/epoch' % (trainer.tot_timer.average_time / 3600. * trainer.itr_per_epoch),
110
+ ]
111
+ screen += ['%s: %.4f' % ('loss_' + k, v.detach()) for k, v in loss_print.items()]
112
+ trainer.logger_info(' '.join(screen))
113
+
114
+ trainer.tot_timer.toc()
115
+ trainer.tot_timer.tic()
116
+ trainer.read_timer.tic()
117
+
118
+ # save model ddp, save model.module on rank 0 only
119
+ save_epoch = getattr(cfg.train, 'save_epoch', 5)
120
+ previous_saved_epoch = None
121
+ remove_previous = getattr(cfg.train, 'remove_checkpoint', False)
122
+ if is_main_process() and (epoch % save_epoch == 0 or epoch == cfg.train.end_epoch - 1):
123
+ trainer.save_model({
124
+ 'epoch': epoch,
125
+ 'network': trainer.model.state_dict(),
126
+ 'optimizer': trainer.optimizer.state_dict(),
127
+ }, epoch)
128
+
129
+ # remove previous
130
+ if previous_saved_epoch is not None and remove_previous:
131
+ to_remove = osp.join(cfg.log.model_dir, f'snapshot_{str(previous_saved_epoch)}.pth.tar')
132
+ os.remove(to_remove)
133
+ previous_saved_epoch = epoch
134
+
135
+ dist.barrier()
136
+
137
+ if __name__ == "__main__":
138
+ main()
SMPLest-X/requirements.txt ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ numpy==1.23.1
2
+ smplx==0.1.28
3
+ tqdm==4.67.1
4
+ opencv-python==4.11.0.86
5
+ chumpy==0.70
6
+ trimesh==4.6.2
7
+ pyrender==0.1.45
8
+ matplotlib==3.7.5
9
+ json_tricks==3.17.3
10
+ einops==0.8.1
11
+ timm==1.0.14
12
+ ultralytics==8.3.75
13
+ pyopengl
SMPLest-X/requirements_py310.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ numpy>=1.23.1,<2.0
2
+ smplx>=0.1.28
3
+ tqdm>=4.67.1
4
+ opencv-python
5
+ chumpy>=0.70
6
+ trimesh>=4.6.2
7
+ pyrender>=0.1.45
8
+ matplotlib>=3.7.5
9
+ json_tricks>=3.17.3
10
+ einops>=0.8.1
11
+ timm>=1.0.14
12
+ ultralytics>=8.3.75
13
+ scipy
14
+ pandas
SMPLest-X/utils/distribute_utils.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import os.path as osp
3
+ import pickle
4
+ import shutil
5
+ import tempfile
6
+ import time
7
+ import torch
8
+ import torch.distributed as dist
9
+ import random
10
+ import numpy as np
11
+
12
+
13
+ def get_dist_info():
14
+ """
15
+ Get the rank and world size in the current distributed training setup.
16
+
17
+ Returns:
18
+ tuple: (rank, world_size)
19
+ rank: int, the rank of the current process.
20
+ world_size: int, the total number of processes in the group.
21
+ """
22
+ if dist.is_available() and dist.is_initialized():
23
+ rank = dist.get_rank()
24
+ world_size = dist.get_world_size()
25
+ else:
26
+ rank = 0
27
+ world_size = 1
28
+ return rank, world_size
29
+
30
+ def set_seed(seed):
31
+ random.seed(seed)
32
+ np.random.seed(seed)
33
+ torch.manual_seed(seed)
34
+ torch.cuda.manual_seed_all(seed)
35
+
36
+
37
+ def time_synchronized():
38
+ torch.cuda.synchronize() if torch.cuda.is_available() else None
39
+ return time.time()
40
+
41
+
42
+ def setup_for_distributed(is_master):
43
+ """This function disables printing when not in master process."""
44
+ import builtins as __builtin__
45
+ builtin_print = __builtin__.print
46
+
47
+ def print(*args, **kwargs):
48
+ force = kwargs.pop('force', False)
49
+ if is_master or force:
50
+ builtin_print(*args, **kwargs)
51
+
52
+ __builtin__.print = print
53
+
54
+
55
+ def init_distributed_mode(port = None, master_port=29500):
56
+ """Initialize slurm distributed training environment.
57
+
58
+ If argument ``port`` is not specified, then the master port will be system
59
+ environment variable ``MASTER_PORT``. If ``MASTER_PORT`` is not in system
60
+ environment variable, then a default port ``29500`` will be used.
61
+
62
+ Args:
63
+ backend (str): Backend of torch.distributed.
64
+ port (int, optional): Master port. Defaults to None.
65
+ """
66
+ # import pdb; pdb.set_trace()
67
+ dist_backend = 'nccl'
68
+ rank = int(os.environ['RANK'])
69
+ num_gpus = torch.cuda.device_count()
70
+ torch.cuda.set_device(rank % num_gpus)
71
+
72
+ dist.init_process_group(backend=dist_backend)
73
+ distributed = True
74
+ gpu_idx = rank % num_gpus
75
+
76
+ return distributed, gpu_idx
77
+
78
+
79
+ def is_dist_avail_and_initialized():
80
+ if not dist.is_available():
81
+ return False
82
+ if not dist.is_initialized():
83
+ return False
84
+ return True
85
+
86
+
87
+ def get_world_size():
88
+ if not is_dist_avail_and_initialized():
89
+ return 1
90
+ return dist.get_world_size()
91
+
92
+
93
+ def get_rank():
94
+ if not is_dist_avail_and_initialized():
95
+ return 0
96
+ return dist.get_rank()
97
+
98
+ def get_process_groups():
99
+ world_size = int(os.environ['WORLD_SIZE'])
100
+ ranks = list(range(world_size))
101
+ num_gpus = torch.cuda.device_count()
102
+ num_nodes = world_size // num_gpus
103
+ if world_size % num_gpus != 0:
104
+ raise NotImplementedError('Not implemented for node not fully used.')
105
+
106
+ groups = []
107
+ for node_idx in range(num_nodes):
108
+ groups.append(ranks[node_idx*num_gpus : (node_idx+1)*num_gpus])
109
+ process_groups = [torch.distributed.new_group(group) for group in groups]
110
+
111
+ return process_groups
112
+
113
+ def get_group_idx():
114
+ num_gpus = torch.cuda.device_count()
115
+ proc_id = get_rank()
116
+ group_idx = proc_id // num_gpus
117
+
118
+ return group_idx
119
+
120
+
121
+ def is_main_process():
122
+ return get_rank() == 0
123
+
124
+ def cleanup():
125
+ dist.destroy_process_group()
126
+
127
+ def all_gather(data):
128
+ """
129
+ Run all_gather on arbitrary picklable data (not necessarily tensors)
130
+ Args:
131
+ data:
132
+ Any picklable object
133
+ Returns:
134
+ data_list(list):
135
+ List of data gathered from each rank
136
+ """
137
+ world_size = get_world_size()
138
+ if world_size == 1:
139
+ return [data]
140
+
141
+ # serialized to a Tensor
142
+ buffer = pickle.dumps(data)
143
+ storage = torch.ByteStorage.from_buffer(buffer)
144
+ tensor = torch.ByteTensor(storage).to('cuda')
145
+
146
+ # obtain Tensor size of each rank
147
+ local_size = torch.tensor([tensor.numel()], device='cuda')
148
+ size_list = [torch.tensor([0], device='cuda') for _ in range(world_size)]
149
+ dist.all_gather(size_list, local_size)
150
+ size_list = [int(size.item()) for size in size_list]
151
+ max_size = max(size_list)
152
+
153
+ # receiving Tensor from all ranks
154
+ # we pad the tensor because torch all_gather does not support
155
+ # gathering tensors of different shapes
156
+ tensor_list = []
157
+ for _ in size_list:
158
+ tensor_list.append(
159
+ torch.empty((max_size, ), dtype=torch.uint8, device='cuda'))
160
+ if local_size != max_size:
161
+ padding = torch.empty(
162
+ size=(max_size - local_size, ), dtype=torch.uint8, device='cuda')
163
+ tensor = torch.cat((tensor, padding), dim=0)
164
+ dist.all_gather(tensor_list, tensor)
165
+
166
+ data_list = []
167
+ for size, tensor in zip(size_list, tensor_list):
168
+ buffer = tensor.cpu().numpy().tobytes()[:size]
169
+ data_list.append(pickle.loads(buffer))
170
+
171
+ return data_list
SMPLest-X/utils/timer.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import time
2
+
3
+ class Timer(object):
4
+ """A simple timer."""
5
+ def __init__(self):
6
+ self.total_time = 0.
7
+ self.calls = 0
8
+ self.start_time = 0.
9
+ self.diff = 0.
10
+ self.average_time = 0.
11
+ self.warm_up = 0
12
+
13
+ def tic(self):
14
+ # using time.time instead of time.clock because time time.clock
15
+ # does not normalize for multithreading
16
+ self.start_time = time.time()
17
+
18
+ def toc(self, average=True):
19
+ self.diff = time.time() - self.start_time
20
+ if self.warm_up < 10:
21
+ self.warm_up += 1
22
+ return self.diff
23
+ else:
24
+ self.total_time += self.diff
25
+ self.calls += 1
26
+ self.average_time = self.total_time / self.calls
27
+
28
+ if average:
29
+ return self.average_time
30
+ else:
31
+ return self.diff
SMPLest-X/utils/transforms.py ADDED
@@ -0,0 +1,366 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import numpy as np
3
+ from torch.nn import functional as F
4
+ from einops.einops import rearrange
5
+
6
+
7
+ def cam2pixel(cam_coord, f, c):
8
+ x = cam_coord[:, 0] / cam_coord[:, 2] * f[0] + c[0]
9
+ y = cam_coord[:, 1] / cam_coord[:, 2] * f[1] + c[1]
10
+ z = cam_coord[:, 2]
11
+ return np.stack((x, y, z), 1)
12
+
13
+
14
+ def pixel2cam(pixel_coord, f, c):
15
+ x = (pixel_coord[:, 0] - c[0]) / f[0] * pixel_coord[:, 2]
16
+ y = (pixel_coord[:, 1] - c[1]) / f[1] * pixel_coord[:, 2]
17
+ z = pixel_coord[:, 2]
18
+ return np.stack((x, y, z), 1)
19
+
20
+
21
+ def world2cam(world_coord, R, t):
22
+ cam_coord = np.dot(R, world_coord.transpose(1, 0)).transpose(1, 0) + t.reshape(1, 3)
23
+ return cam_coord
24
+
25
+
26
+ def cam2world(cam_coord, R, t):
27
+ world_coord = np.dot(np.linalg.inv(R), (cam_coord - t.reshape(1, 3)).transpose(1, 0)).transpose(1, 0)
28
+ return world_coord
29
+
30
+
31
+ def rigid_transform_3D(A, B):
32
+ n, dim = A.shape
33
+ centroid_A = np.mean(A, axis=0)
34
+ centroid_B = np.mean(B, axis=0)
35
+ H = np.dot(np.transpose(A - centroid_A), B - centroid_B) / n
36
+ U, s, V = np.linalg.svd(H)
37
+ R = np.dot(np.transpose(V), np.transpose(U))
38
+ if np.linalg.det(R) < 0:
39
+ s[-1] = -s[-1]
40
+ V[2] = -V[2]
41
+ R = np.dot(np.transpose(V), np.transpose(U))
42
+
43
+ varP = np.var(A, axis=0).sum()
44
+ c = 1 / varP * np.sum(s)
45
+
46
+ t = -np.dot(c * R, np.transpose(centroid_A)) + np.transpose(centroid_B)
47
+ return c, R, t
48
+
49
+
50
+ def rigid_align(A, B):
51
+ c, R, t = rigid_transform_3D(A, B)
52
+ A2 = np.transpose(np.dot(c * R, np.transpose(A))) + t
53
+ return A2
54
+
55
+
56
+ def transform_joint_to_other_db(src_joint, src_name, dst_name):
57
+ src_joint_num = len(src_name)
58
+ dst_joint_num = len(dst_name)
59
+
60
+ new_joint = np.zeros(((dst_joint_num,) + src_joint.shape[1:]), dtype=np.float32)
61
+ for src_idx in range(len(src_name)):
62
+ name = src_name[src_idx]
63
+ if name in dst_name:
64
+ dst_idx = dst_name.index(name)
65
+ new_joint[dst_idx] = src_joint[src_idx]
66
+
67
+ return new_joint
68
+
69
+
70
+ def rotation_matrix_to_angle_axis(rotation_matrix):
71
+ """Convert 3x4 rotation matrix to Rodrigues vector
72
+
73
+ Args:
74
+ rotation_matrix (Tensor): rotation matrix.
75
+
76
+ Returns:
77
+ Tensor: Rodrigues vector transformation.
78
+
79
+ Shape:
80
+ - Input: :math:`(N, 3, 4)`
81
+ - Output: :math:`(N, 3)`
82
+
83
+ Example:
84
+ >>> input = torch.rand(2, 3, 4) # Nx4x4
85
+ >>> output = tgm.rotation_matrix_to_angle_axis(input) # Nx3
86
+ """
87
+ # todo add check that matrix is a valid rotation matrix
88
+ quaternion = rotation_matrix_to_quaternion(rotation_matrix)
89
+ return quaternion_to_angle_axis(quaternion)
90
+
91
+ def quaternion_to_angle_axis(quaternion: torch.Tensor) -> torch.Tensor:
92
+ """Convert quaternion vector to angle axis of rotation.
93
+
94
+ Adapted from ceres C++ library: ceres-solver/include/ceres/rotation.h
95
+
96
+ Args:
97
+ quaternion (torch.Tensor): tensor with quaternions.
98
+
99
+ Return:
100
+ torch.Tensor: tensor with angle axis of rotation.
101
+
102
+ Shape:
103
+ - Input: :math:`(*, 4)` where `*` means, any number of dimensions
104
+ - Output: :math:`(*, 3)`
105
+
106
+ Example:
107
+ >>> quaternion = torch.rand(2, 4) # Nx4
108
+ >>> angle_axis = tgm.quaternion_to_angle_axis(quaternion) # Nx3
109
+ """
110
+ if not torch.is_tensor(quaternion):
111
+ raise TypeError("Input type is not a torch.Tensor. Got {}".format(
112
+ type(quaternion)))
113
+
114
+ if not quaternion.shape[-1] == 4:
115
+ raise ValueError("Input must be a tensor of shape Nx4 or 4. Got {}"
116
+ .format(quaternion.shape))
117
+ # unpack input and compute conversion
118
+ q1: torch.Tensor = quaternion[..., 1]
119
+ q2: torch.Tensor = quaternion[..., 2]
120
+ q3: torch.Tensor = quaternion[..., 3]
121
+ sin_squared_theta: torch.Tensor = q1 * q1 + q2 * q2 + q3 * q3
122
+
123
+ sin_theta: torch.Tensor = torch.sqrt(sin_squared_theta)
124
+ cos_theta: torch.Tensor = quaternion[..., 0]
125
+ two_theta: torch.Tensor = 2.0 * torch.where(
126
+ cos_theta < 0.0,
127
+ torch.atan2(-sin_theta, -cos_theta),
128
+ torch.atan2(sin_theta, cos_theta))
129
+
130
+ k_pos: torch.Tensor = two_theta / sin_theta
131
+ k_neg: torch.Tensor = 2.0 * torch.ones_like(sin_theta)
132
+ k: torch.Tensor = torch.where(sin_squared_theta > 0.0, k_pos, k_neg)
133
+
134
+ angle_axis: torch.Tensor = torch.zeros_like(quaternion)[..., :3]
135
+ angle_axis[..., 0] += q1 * k
136
+ angle_axis[..., 1] += q2 * k
137
+ angle_axis[..., 2] += q3 * k
138
+ return angle_axis
139
+
140
+ def rotation_matrix_to_quaternion(rotation_matrix, eps=1e-6):
141
+ """Convert 3x4 rotation matrix to 4d quaternion vector
142
+
143
+ This algorithm is based on algorithm described in
144
+ https://github.com/KieranWynn/pyquaternion/blob/master/pyquaternion/quaternion.py#L201
145
+
146
+ Args:
147
+ rotation_matrix (Tensor): the rotation matrix to convert.
148
+
149
+ Return:
150
+ Tensor: the rotation in quaternion
151
+
152
+ Shape:
153
+ - Input: :math:`(N, 3, 4)`
154
+ - Output: :math:`(N, 4)`
155
+
156
+ Example:
157
+ >>> input = torch.rand(4, 3, 4) # Nx3x4
158
+ >>> output = tgm.rotation_matrix_to_quaternion(input) # Nx4
159
+ """
160
+ if not torch.is_tensor(rotation_matrix):
161
+ raise TypeError("Input type is not a torch.Tensor. Got {}".format(
162
+ type(rotation_matrix)))
163
+
164
+ input_shape = rotation_matrix.shape
165
+ if len(input_shape) == 2:
166
+ rotation_matrix = rotation_matrix.unsqueeze(0)
167
+
168
+ if len(rotation_matrix.shape) > 3:
169
+ raise ValueError(
170
+ "Input size must be a three dimensional tensor. Got {}".format(
171
+ rotation_matrix.shape))
172
+ if not rotation_matrix.shape[-2:] == (3, 4):
173
+ raise ValueError(
174
+ "Input size must be a N x 3 x 4 tensor. Got {}".format(
175
+ rotation_matrix.shape))
176
+
177
+ rmat_t = torch.transpose(rotation_matrix, 1, 2)
178
+
179
+ mask_d2 = rmat_t[:, 2, 2] < eps
180
+
181
+ mask_d0_d1 = rmat_t[:, 0, 0] > rmat_t[:, 1, 1]
182
+ mask_d0_nd1 = rmat_t[:, 0, 0] < -rmat_t[:, 1, 1]
183
+
184
+ t0 = 1 + rmat_t[:, 0, 0] - rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
185
+ q0 = torch.stack([rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
186
+ t0, rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
187
+ rmat_t[:, 2, 0] + rmat_t[:, 0, 2]], -1)
188
+ t0_rep = t0.repeat(4, 1).t()
189
+
190
+ t1 = 1 - rmat_t[:, 0, 0] + rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
191
+ q1 = torch.stack([rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
192
+ rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
193
+ t1, rmat_t[:, 1, 2] + rmat_t[:, 2, 1]], -1)
194
+ t1_rep = t1.repeat(4, 1).t()
195
+
196
+ t2 = 1 - rmat_t[:, 0, 0] - rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
197
+ q2 = torch.stack([rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
198
+ rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
199
+ rmat_t[:, 1, 2] + rmat_t[:, 2, 1], t2], -1)
200
+ t2_rep = t2.repeat(4, 1).t()
201
+
202
+ t3 = 1 + rmat_t[:, 0, 0] + rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
203
+ q3 = torch.stack([t3, rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
204
+ rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
205
+ rmat_t[:, 0, 1] - rmat_t[:, 1, 0]], -1)
206
+ t3_rep = t3.repeat(4, 1).t()
207
+
208
+ mask_c0 = mask_d2.float() * mask_d0_d1.float()
209
+ mask_c1 = mask_d2.float() * (1 - mask_d0_d1.float())
210
+ mask_c2 = (1 - mask_d2.float()) * mask_d0_nd1.float()
211
+ mask_c3 = (1 - mask_d2.float()) * (1 - mask_d0_nd1.float())
212
+ mask_c0 = mask_c0.view(-1, 1).type_as(q0)
213
+ mask_c1 = mask_c1.view(-1, 1).type_as(q1)
214
+ mask_c2 = mask_c2.view(-1, 1).type_as(q2)
215
+ mask_c3 = mask_c3.view(-1, 1).type_as(q3)
216
+
217
+ q = q0 * mask_c0 + q1 * mask_c1 + q2 * mask_c2 + q3 * mask_c3
218
+ q /= torch.sqrt(t0_rep * mask_c0 + t1_rep * mask_c1 + # noqa
219
+ t2_rep * mask_c2 + t3_rep * mask_c3)
220
+ q *= 0.5
221
+
222
+ if len(input_shape) == 2:
223
+ q = q.squeeze(0)
224
+ return q
225
+
226
+ def rot6d_to_axis_angle(x):
227
+ batch_size = x.shape[0]
228
+
229
+ x = x.view(-1, 3, 2)
230
+ a1 = x[:, :, 0]
231
+ a2 = x[:, :, 1]
232
+ b1 = F.normalize(a1)
233
+ b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
234
+ b3 = torch.cross(b1, b2)
235
+ rot_mat = torch.stack((b1, b2, b3), dim=-1) # 3x3 rotation matrix
236
+
237
+ rot_mat = torch.cat([rot_mat, torch.zeros((batch_size, 3, 1)).cuda().float()], 2) # 3x4 rotation matrix
238
+ axis_angle = rotation_matrix_to_angle_axis(rot_mat).reshape(-1, 3) # axis-angle
239
+ axis_angle[torch.isnan(axis_angle)] = 0.0
240
+ return axis_angle
241
+
242
+ def rot6d_to_rotmat(x):
243
+ """Convert 6D rotation representation to 3x3 rotation matrix.
244
+ Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
245
+ Input:
246
+ (B,6) Batch of 6-D rotation representations
247
+ Output:
248
+ (B,3,3) Batch of corresponding rotation matrices
249
+ """
250
+ if x.shape[-1] == 6:
251
+ batch_size = x.shape[0]
252
+ if len(x.shape) == 3:
253
+ num = x.shape[1]
254
+ x = rearrange(x, 'b n d -> (b n) d', d=6)
255
+ else:
256
+ num = 1
257
+ x = rearrange(x, 'b (k l) -> b k l', k=3, l=2)
258
+ # x = x.view(-1,3,2)
259
+ a1 = x[:, :, 0]
260
+ a2 = x[:, :, 1]
261
+ b1 = F.normalize(a1)
262
+ b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
263
+ b3 = torch.cross(b1, b2, dim=-1)
264
+
265
+ mat = torch.stack((b1, b2, b3), dim=-1)
266
+ if num > 1:
267
+ mat = rearrange(mat, '(b n) h w-> b n h w', b=batch_size, n=num, h=3, w=3)
268
+ else:
269
+ x = x.view(-1,3,2)
270
+ a1 = x[:, :, 0]
271
+ a2 = x[:, :, 1]
272
+ b1 = F.normalize(a1)
273
+ b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
274
+ b3 = torch.cross(b1, b2, dim=-1)
275
+ mat = torch.stack((b1, b2, b3), dim=-1)
276
+ return mat
277
+
278
+ def batch_rodrigues(theta):
279
+ """Convert axis-angle representation to rotation matrix.
280
+ Args:
281
+ theta: size = [B, 3]
282
+ Returns:
283
+ Rotation matrix corresponding to the quaternion -- size = [B, 3, 3]
284
+ """
285
+ l1norm = torch.norm(theta + 1e-8, p = 2, dim = 1)
286
+ angle = torch.unsqueeze(l1norm, -1)
287
+ normalized = torch.div(theta, angle)
288
+ angle = angle * 0.5
289
+ v_cos = torch.cos(angle)
290
+ v_sin = torch.sin(angle)
291
+ quat = torch.cat([v_cos, v_sin * normalized], dim = 1)
292
+ return quat_to_rotmat(quat)
293
+
294
+ def quat_to_rotmat(quat):
295
+ """Convert quaternion coefficients to rotation matrix.
296
+ Args:
297
+ quat: size = [B, 4] 4 <===>(w, x, y, z)
298
+ Returns:
299
+ Rotation matrix corresponding to the quaternion -- size = [B, 3, 3]
300
+ """
301
+ norm_quat = quat
302
+ norm_quat = norm_quat/norm_quat.norm(p=2, dim=1, keepdim=True)
303
+ w, x, y, z = norm_quat[:,0], norm_quat[:,1], norm_quat[:,2], norm_quat[:,3]
304
+
305
+ B = quat.size(0)
306
+
307
+ w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2)
308
+ wx, wy, wz = w*x, w*y, w*z
309
+ xy, xz, yz = x*y, x*z, y*z
310
+
311
+ rotMat = torch.stack([w2 + x2 - y2 - z2, 2*xy - 2*wz, 2*wy + 2*xz,
312
+ 2*wz + 2*xy, w2 - x2 + y2 - z2, 2*yz - 2*wx,
313
+ 2*xz - 2*wy, 2*wx + 2*yz, w2 - x2 - y2 + z2], dim=1).view(B, 3, 3)
314
+ return rotMat
315
+
316
+ def sample_joint_features(img_feat, joint_xy):
317
+ height, width = img_feat.shape[2:]
318
+ x = joint_xy[:, :, 0] / (width - 1) * 2 - 1
319
+ y = joint_xy[:, :, 1] / (height - 1) * 2 - 1
320
+ grid = torch.stack((x, y), 2)[:, :, None, :]
321
+ img_feat = F.grid_sample(img_feat, grid, align_corners=True)[:, :, :, 0] # batch_size, channel_dim, joint_num
322
+ img_feat = img_feat.permute(0, 2, 1).contiguous() # batch_size, joint_num, channel_dim
323
+ return img_feat
324
+
325
+
326
+ def soft_argmax_2d(heatmap2d):
327
+ batch_size = heatmap2d.shape[0]
328
+ height, width = heatmap2d.shape[2:]
329
+ heatmap2d = heatmap2d.reshape((batch_size, -1, height * width))
330
+ heatmap2d = F.softmax(heatmap2d, 2)
331
+ heatmap2d = heatmap2d.reshape((batch_size, -1, height, width))
332
+
333
+ accu_x = heatmap2d.sum(dim=(2))
334
+ accu_y = heatmap2d.sum(dim=(3))
335
+
336
+ accu_x = accu_x * torch.arange(width).float().cuda()[None, None, :]
337
+ accu_y = accu_y * torch.arange(height).float().cuda()[None, None, :]
338
+
339
+ accu_x = accu_x.sum(dim=2, keepdim=True)
340
+ accu_y = accu_y.sum(dim=2, keepdim=True)
341
+
342
+ coord_out = torch.cat((accu_x, accu_y), dim=2)
343
+ return coord_out
344
+
345
+
346
+ def soft_argmax_3d(heatmap3d):
347
+ batch_size = heatmap3d.shape[0]
348
+ depth, height, width = heatmap3d.shape[2:]
349
+ heatmap3d = heatmap3d.reshape((batch_size, -1, depth * height * width))
350
+ heatmap3d = F.softmax(heatmap3d, 2)
351
+ heatmap3d = heatmap3d.reshape((batch_size, -1, depth, height, width))
352
+
353
+ accu_x = heatmap3d.sum(dim=(2, 3))
354
+ accu_y = heatmap3d.sum(dim=(2, 4))
355
+ accu_z = heatmap3d.sum(dim=(3, 4))
356
+
357
+ accu_x = accu_x * torch.arange(width).float().cuda()[None, None, :]
358
+ accu_y = accu_y * torch.arange(height).float().cuda()[None, None, :]
359
+ accu_z = accu_z * torch.arange(depth).float().cuda()[None, None, :]
360
+
361
+ accu_x = accu_x.sum(dim=2, keepdim=True)
362
+ accu_y = accu_y.sum(dim=2, keepdim=True)
363
+ accu_z = accu_z.sum(dim=2, keepdim=True)
364
+
365
+ coord_out = torch.cat((accu_x, accu_y, accu_z), dim=2)
366
+ return coord_out
WiLoR/.DS_Store ADDED
Binary file (6.15 kB). View file
 
WiLoR/README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ # WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild
4
+
5
+ [Rolandos Alexandros Potamias](https://rolpotamias.github.io)<sup>1</sup> &emsp; [Jinglei Zhang]()<sup>2</sup> &emsp; [Jiankang Deng](https://jiankangdeng.github.io/)<sup>1</sup> &emsp; [Stefanos Zafeiriou](https://www.imperial.ac.uk/people/s.zafeiriou)<sup>1</sup>
6
+
7
+ <sup>1</sup>Imperial College London, UK <br>
8
+ <sup>2</sup>Shanghai Jiao Tong University, China
9
+
10
+ <font color="blue"><strong>CVPR 2025</strong></font>
11
+
12
+ <a href='https://rolpotamias.github.io/WiLoR/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
13
+ <a href='https://arxiv.org/abs/2409.12259'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>
14
+ <a href='https://huggingface.co/spaces/rolpotamias/WiLoR'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-green'></a>
15
+ <a href='https://colab.research.google.com/drive/1bNnYFECmJbbvCNZAKtQcxJGxf0DZppsB?usp=sharing'><img src='https://colab.research.google.com/assets/colab-badge.svg'></a>
16
+ </div>
17
+
18
+ <div align="center">
19
+
20
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wilor-end-to-end-3d-hand-localization-and/3d-hand-pose-estimation-on-freihand)](https://paperswithcode.com/sota/3d-hand-pose-estimation-on-freihand?p=wilor-end-to-end-3d-hand-localization-and)
21
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wilor-end-to-end-3d-hand-localization-and/3d-hand-pose-estimation-on-ho-3d)](https://paperswithcode.com/sota/3d-hand-pose-estimation-on-ho-3d?p=wilor-end-to-end-3d-hand-localization-and)
22
+
23
+ </div>
24
+
25
+ This is the official implementation of **[WiLoR](https://rolpotamias.github.io/WiLoR/)**, an state-of-the-art hand localization and reconstruction model:
26
+
27
+ ![teaser](assets/teaser.png)
28
+
29
+ ## Installation
30
+ ### [Update] Quick Installation
31
+ Thanks to [@warmshao](https://github.com/warmshao) WiLoR can now be installed using a single pip command:
32
+ ```
33
+ pip install git+https://github.com/warmshao/WiLoR-mini
34
+ ```
35
+ Please head to [WiLoR-mini](https://github.com/warmshao/WiLoR-mini) for additional details.
36
+
37
+ **Note:** the above code is a simplified version of WiLoR and can be used for demo only.
38
+ If you wish to use WiLoR for other tasks it is suggested to follow the original installation instructued bellow:
39
+ ### Original Installation
40
+ ```
41
+ git clone --recursive https://github.com/rolpotamias/WiLoR.git
42
+ cd WiLoR
43
+ ```
44
+
45
+ The code has been tested with PyTorch 2.0.0 and CUDA 11.7. It is suggested to use an anaconda environment to install the the required dependencies:
46
+ ```bash
47
+ conda create --name wilor python=3.10
48
+ conda activate wilor
49
+
50
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117
51
+ # Install requirements
52
+ pip install -r requirements.txt
53
+ ```
54
+ Download the pretrained models using:
55
+ ```bash
56
+ wget https://huggingface.co/spaces/rolpotamias/WiLoR/resolve/main/pretrained_models/detector.pt -P ./pretrained_models/
57
+ wget https://huggingface.co/spaces/rolpotamias/WiLoR/resolve/main/pretrained_models/wilor_final.ckpt -P ./pretrained_models/
58
+ ```
59
+ It is also required to download MANO model from [MANO website](https://mano.is.tue.mpg.de).
60
+ Create an account by clicking Sign Up and download the models (mano_v*_*.zip). Unzip and place the right hand model `MANO_RIGHT.pkl` under the `mano_data/` folder.
61
+ Note that MANO model falls under the [MANO license](https://mano.is.tue.mpg.de/license.html).
62
+ ## Demo
63
+ ```bash
64
+ python demo.py --img_folder demo_img --out_folder demo_out --save_mesh
65
+ ```
66
+ ## Start a local gradio demo
67
+ You can start a local demo for inference by running:
68
+ ```bash
69
+ python gradio_demo.py
70
+ ```
71
+ ## WHIM Dataset
72
+ To download WHIM dataset please follow the instructions [here](./whim/Dataset_instructions.md)
73
+
74
+ ## Acknowledgements
75
+ Parts of the code are taken or adapted from the following repos:
76
+ - [HaMeR](https://github.com/geopavlakos/hamer/)
77
+ - [Ultralytics](https://github.com/ultralytics/ultralytics)
78
+
79
+ ## License
80
+ WiLoR models fall under the [CC-BY-NC--ND License](./license.txt). This repository depends also on [Ultralytics library](https://github.com/ultralytics/ultralytics) and [MANO Model](https://mano.is.tue.mpg.de/license.html), which are fall under their own licenses. By using this repository, you must also comply with the terms of these external licenses.
81
+ ## Citing
82
+ If you find WiLoR useful for your research, please consider citing our paper:
83
+
84
+ ```bibtex
85
+ @misc{potamias2024wilor,
86
+ title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
87
+ author={Rolandos Alexandros Potamias and Jinglei Zhang and Jiankang Deng and Stefanos Zafeiriou},
88
+ year={2024},
89
+ eprint={2409.12259},
90
+ archivePrefix={arXiv},
91
+ primaryClass={cs.CV}
92
+ }
93
+ ```
WiLoR/demo.py ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+ import torch
3
+ import argparse
4
+ import os
5
+ import cv2
6
+ import numpy as np
7
+ import joblib # 用于保存结果
8
+ from typing import Dict, Optional
9
+
10
+ from wilor.models import WiLoR, load_wilor
11
+ from wilor.utils import recursive_to
12
+ from wilor.datasets.vitdet_dataset import ViTDetDataset
13
+ from wilor.utils.renderer import cam_crop_to_full # 只保留这个数学计算函数
14
+ # 移除 Renderer 导入,避免触发 OpenGL
15
+ # from wilor.utils.renderer import Renderer
16
+ from ultralytics import YOLO
17
+
18
+ def main():
19
+ parser = argparse.ArgumentParser(description='WiLoR demo code (No Render)')
20
+ parser.add_argument('--img_folder', type=str, default=r'D:\SMPL-X_pose_extraction\demo\inputs', help='Folder with input images')
21
+ parser.add_argument('--out_folder', type=str, default=r'D:\SMPL-X_pose_extraction\demo\wilor_outputs', help='Output folder to save prediction results')
22
+ parser.add_argument('--rescale_factor', type=float, default=2.0, help='Factor for padding the bbox')
23
+ parser.add_argument('--file_type', nargs='+', default=['*.jpg', '*.png', '*.jpeg'], help='List of file extensions to consider')
24
+
25
+ args = parser.parse_args()
26
+
27
+ # 1. Load Checkpoints
28
+ print("Loading models...")
29
+ model, model_cfg = load_wilor(checkpoint_path=r'D:\SMPL-X_pose_extraction\pretrained_weight\wilor\wilor_final.ckpt', cfg_path='./pretrained_models/model_config.yaml')
30
+ detector = YOLO(r'D:\SMPL-X_pose_extraction\pretrained_weight\wilor\detector.pt')
31
+
32
+ # 2. Setup Device (No Renderer init here)
33
+ device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
34
+ model = model.to(device)
35
+ detector = detector.to(device)
36
+ model.eval()
37
+
38
+ # Make output directory
39
+ os.makedirs(args.out_folder, exist_ok=True)
40
+
41
+ # Get images
42
+ img_paths = [img for end in args.file_type for img in Path(args.img_folder).glob(end)]
43
+ print(f"Found {len(img_paths)} images.")
44
+
45
+ # Iterate over images
46
+ for img_path in img_paths:
47
+ print(f"Processing {img_path.name}...")
48
+ img_cv2 = cv2.imread(str(img_path))
49
+
50
+ # Detect hands
51
+ detections = detector(img_cv2, conf=0.3, verbose=False)[0]
52
+ bboxes = []
53
+ is_right = []
54
+ for det in detections:
55
+ Bbox = det.boxes.data.cpu().detach().squeeze().numpy()
56
+ is_right.append(det.boxes.cls.cpu().detach().squeeze().item())
57
+ bboxes.append(Bbox[:4].tolist())
58
+
59
+ if len(bboxes) == 0:
60
+ print(f"No hands detected in {img_path.name}")
61
+ continue
62
+
63
+ boxes = np.stack(bboxes)
64
+ right = np.stack(is_right)
65
+
66
+ # Create Dataset & Loader
67
+ dataset = ViTDetDataset(model_cfg, img_cv2, boxes, right, rescale_factor=args.rescale_factor)
68
+ dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=False, num_workers=0)
69
+
70
+ results_list = []
71
+
72
+ # Inference Loop
73
+ for batch in dataloader:
74
+ batch = recursive_to(batch, device)
75
+
76
+ with torch.no_grad():
77
+ out = model(batch)
78
+
79
+ # Post-process Camera Parameters
80
+ multiplier = (2*batch['right']-1)
81
+ pred_cam = out['pred_cam']
82
+ pred_cam[:,1] = multiplier*pred_cam[:,1]
83
+
84
+ box_center = batch["box_center"].float()
85
+ box_size = batch["box_size"].float()
86
+ img_size = batch["img_size"].float()
87
+
88
+ # Calculate focal length & full image camera translation
89
+ scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max()
90
+ pred_cam_t_full = cam_crop_to_full(pred_cam, box_center, box_size, img_size, scaled_focal_length).detach().cpu().numpy()
91
+
92
+ # Collect Results
93
+ batch_size_curr = batch['img'].shape[0]
94
+ for n in range(batch_size_curr):
95
+ verts = out['pred_vertices'][n].detach().cpu().numpy()
96
+ joints = out['pred_keypoints_3d'][n].detach().cpu().numpy()
97
+
98
+ # Correct orientation for left hands
99
+ is_right_curr = batch['right'][n].cpu().numpy()
100
+ verts[:, 0] = (2 * is_right_curr - 1) * verts[:, 0]
101
+ joints[:, 0] = (2 * is_right_curr - 1) * joints[:, 0]
102
+
103
+ cam_t = pred_cam_t_full[n]
104
+
105
+ # Store data needed for later visualization
106
+ hand_data = {
107
+ 'vertices': verts, # [778, 3] mesh vertices
108
+ 'joints_3d': joints, # [21, 3] 3D joints
109
+ 'cam_t': cam_t, # [3] Camera translation
110
+ 'focal_length': scaled_focal_length.cpu().item(),
111
+ 'is_right': int(is_right_curr), # 1 for right, 0 for left
112
+ 'img_res': img_size[n].cpu().numpy(),
113
+ 'faces': model.mano.faces # MANO faces indices
114
+ }
115
+ results_list.append(hand_data)
116
+
117
+ # Save results to disk (PKL file)
118
+ if len(results_list) > 0:
119
+ img_fn, _ = os.path.splitext(os.path.basename(img_path))
120
+ save_path = os.path.join(args.out_folder, f'{img_fn}_results.pkl')
121
+ joblib.dump(results_list, save_path)
122
+ print(f"Saved results to {save_path}")
123
+
124
+ def project_full_img(points, cam_trans, focal_length, img_res):
125
+ # 此函数保留,如果你想把2D投影点也存进去可以调用它
126
+ camera_center = [img_res[0] / 2., img_res[1] / 2.]
127
+ K = torch.eye(3)
128
+ K[0,0] = focal_length
129
+ K[1,1] = focal_length
130
+ K[0,2] = camera_center[0]
131
+ K[1,2] = camera_center[1]
132
+ points = points + cam_trans
133
+ points = points / points[..., -1:]
134
+
135
+ V_2d = (K @ points.T).T
136
+ return V_2d[..., :-1]
137
+
138
+ if __name__ == '__main__':
139
+ main()
WiLoR/demo.sh ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ $env:PYOPENGL_PLATFORM = "wgl"
2
+ python demo.py
WiLoR/download_videos.py ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import numpy as np
4
+ import argparse
5
+ from pytubefix import YouTube
6
+
7
+ parser = argparse.ArgumentParser()
8
+
9
+ parser.add_argument("--root", type=str, help="Directory of WiLoR")
10
+ parser.add_argument("--mode", type=str, choices=['train', 'test'], default= 'train', help="Train/Test set")
11
+
12
+ args = parser.parse_args()
13
+
14
+ with open(os.path.join(args.root, f'./whim/{args.mode}_video_ids.json')) as f:
15
+ video_dict = json.load(f)
16
+
17
+ Video_IDs = video_dict.keys()
18
+ failed_IDs = []
19
+ os.makedirs(os.path.join(args.root, 'Videos'), exist_ok=True)
20
+
21
+ for Video_ID in Video_IDs:
22
+ res = video_dict[Video_ID]['res'][0]
23
+ try:
24
+ YouTube('https://youtu.be/'+Video_ID).streams.filter(only_video=True,
25
+ file_extension='mp4',
26
+ res =f'{res}p'
27
+ ).order_by('resolution').desc().first().download(
28
+ output_path=os.path.join(args.root, 'Videos') ,
29
+ filename = Video_ID +'.mp4')
30
+ except:
31
+ print(f'Failed {Video_ID}')
32
+ failed_IDs.append(Video_ID)
33
+ continue
34
+
35
+
36
+ cap = cv2.VideoCapture(os.path.join(args.root, 'Videos', Video_ID + '.mp4'))
37
+ if (cap.isOpened()== False):
38
+ print(f"Error opening video stream {os.path.join(args.root, 'Videos', Video_ID + '.mp4')}")
39
+
40
+ VIDEO_LEN = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
41
+ length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
42
+ fps = cap.get(cv2.CAP_PROP_FPS)
43
+
44
+ fps_org = video_dict[Video_ID]['fps']
45
+ fps_rate = round(fps / fps_org)
46
+
47
+ all_frames = os.listdir(os.path.join(args.root, 'WHIM', args.mode, 'anno', Video_ID))
48
+
49
+ for frame in all_frames:
50
+ frame_gt = int(frame[:-4])
51
+ frame_idx = (frame_gt * fps_rate)
52
+
53
+ cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
54
+ ret, img_cv2 = cap.read()
55
+
56
+ cv2.imwrite(os.path.join(args.root, 'WHIM', args.mode, 'anno', Video_ID, frame +'.jpg' ), img_cv2.astype(np.float32))
57
+
58
+ np.save(os.path.join(args.root, 'failed_videos.npy'), failed_IDs)
WiLoR/gradio_demo.py ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ os.environ["PYOPENGL_PLATFORM"] = "egl"
4
+ os.environ["MESA_GL_VERSION_OVERRIDE"] = "4.1"
5
+ # os.system('pip install /home/user/app/pyrender')
6
+ # sys.path.append('/home/user/app/pyrender')
7
+
8
+ import gradio as gr
9
+ #import spaces
10
+ import cv2
11
+ import numpy as np
12
+ import torch
13
+ from ultralytics import YOLO
14
+ from pathlib import Path
15
+ import argparse
16
+ import json
17
+ from typing import Dict, Optional
18
+
19
+ from wilor.models import WiLoR, load_wilor
20
+ from wilor.utils import recursive_to
21
+ from wilor.datasets.vitdet_dataset import ViTDetDataset, DEFAULT_MEAN, DEFAULT_STD
22
+ from wilor.utils.renderer import Renderer, cam_crop_to_full
23
+ device = torch.device('cpu') if torch.cuda.is_available() else torch.device('cuda')
24
+
25
+ LIGHT_PURPLE=(0.25098039, 0.274117647, 0.65882353)
26
+
27
+ model, model_cfg = load_wilor(checkpoint_path = './pretrained_models/wilor_final.ckpt' , cfg_path= './pretrained_models/model_config.yaml')
28
+ # Setup the renderer
29
+ renderer = Renderer(model_cfg, faces=model.mano.faces)
30
+ model = model.to(device)
31
+ model.eval()
32
+
33
+ detector = YOLO(f'./pretrained_models/detector.pt').to(device)
34
+
35
+ def render_reconstruction(image, conf, IoU_threshold=0.3):
36
+ input_img, num_dets, reconstructions = run_wilow_model(image, conf, IoU_threshold=0.5)
37
+ if num_dets> 0:
38
+ # Render front view
39
+
40
+ misc_args = dict(
41
+ mesh_base_color=LIGHT_PURPLE,
42
+ scene_bg_color=(1, 1, 1),
43
+ focal_length=reconstructions['focal'],
44
+ )
45
+
46
+ cam_view = renderer.render_rgba_multiple(reconstructions['verts'],
47
+ cam_t=reconstructions['cam_t'],
48
+ render_res=reconstructions['img_size'],
49
+ is_right=reconstructions['right'], **misc_args)
50
+
51
+ # Overlay image
52
+
53
+ input_img = np.concatenate([input_img, np.ones_like(input_img[:,:,:1])], axis=2) # Add alpha channel
54
+ input_img_overlay = input_img[:,:,:3] * (1-cam_view[:,:,3:]) + cam_view[:,:,:3] * cam_view[:,:,3:]
55
+
56
+ return input_img_overlay, f'{num_dets} hands detected'
57
+ else:
58
+ return input_img, f'{num_dets} hands detected'
59
+
60
+ #@spaces.GPU()
61
+ def run_wilow_model(image, conf, IoU_threshold=0.5):
62
+ img_cv2 = image[...,::-1]
63
+ img_vis = image.copy()
64
+
65
+ detections = detector(img_cv2, conf=conf, verbose=False, iou=IoU_threshold)[0]
66
+
67
+ bboxes = []
68
+ is_right = []
69
+ for det in detections:
70
+ Bbox = det.boxes.data.cpu().detach().squeeze().numpy()
71
+ Conf = det.boxes.conf.data.cpu().detach()[0].numpy().reshape(-1).astype(np.float16)
72
+ Side = det.boxes.cls.data.cpu().detach()
73
+ #Bbox[:2] -= np.int32(0.1 * Bbox[:2])
74
+ #Bbox[2:] += np.int32(0.1 * Bbox[ 2:])
75
+ is_right.append(det.boxes.cls.cpu().detach().squeeze().item())
76
+ bboxes.append(Bbox[:4].tolist())
77
+
78
+ color = (255*0.208, 255*0.647 ,255*0.603 ) if Side==0. else (255*1, 255*0.78039, 255*0.2353)
79
+ label = f'L - {Conf[0]:.3f}' if Side==0 else f'R - {Conf[0]:.3f}'
80
+
81
+ cv2.rectangle(img_vis, (int(Bbox[0]), int(Bbox[1])), (int(Bbox[2]), int(Bbox[3])), color , 3)
82
+ (w, h), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)
83
+ cv2.rectangle(img_vis, (int(Bbox[0]), int(Bbox[1]) - 20), (int(Bbox[0]) + w, int(Bbox[1])), color, -1)
84
+ cv2.putText(img_vis, label, (int(Bbox[0]), int(Bbox[1]) - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,0,0), 2)
85
+
86
+ if len(bboxes) != 0:
87
+ boxes = np.stack(bboxes)
88
+ right = np.stack(is_right)
89
+ dataset = ViTDetDataset(model_cfg, img_cv2, boxes, right, rescale_factor=2.0 )
90
+ dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=False, num_workers=0)
91
+
92
+ all_verts = []
93
+ all_cam_t = []
94
+ all_right = []
95
+ all_joints= []
96
+
97
+ for batch in dataloader:
98
+ batch = recursive_to(batch, device)
99
+
100
+ with torch.no_grad():
101
+ out = model(batch)
102
+
103
+ multiplier = (2*batch['right']-1)
104
+ pred_cam = out['pred_cam']
105
+ pred_cam[:,1] = multiplier*pred_cam[:,1]
106
+ box_center = batch["box_center"].float()
107
+ box_size = batch["box_size"].float()
108
+ img_size = batch["img_size"].float()
109
+ scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max()
110
+ pred_cam_t_full = cam_crop_to_full(pred_cam, box_center, box_size, img_size, scaled_focal_length).detach().cpu().numpy()
111
+
112
+
113
+ batch_size = batch['img'].shape[0]
114
+ for n in range(batch_size):
115
+
116
+ verts = out['pred_vertices'][n].detach().cpu().numpy()
117
+ joints = out['pred_keypoints_3d'][n].detach().cpu().numpy()
118
+
119
+ is_right = batch['right'][n].cpu().numpy()
120
+ verts[:,0] = (2*is_right-1)*verts[:,0]
121
+ joints[:,0] = (2*is_right-1)*joints[:,0]
122
+
123
+ cam_t = pred_cam_t_full[n]
124
+
125
+ all_verts.append(verts)
126
+ all_cam_t.append(cam_t)
127
+ all_right.append(is_right)
128
+ all_joints.append(joints)
129
+
130
+ reconstructions = {'verts': all_verts, 'cam_t': all_cam_t, 'right': all_right, 'img_size': img_size[n], 'focal': scaled_focal_length}
131
+ return img_vis.astype(np.float32)/255.0, len(detections), reconstructions
132
+ else:
133
+ return img_vis.astype(np.float32)/255.0, len(detections), None
134
+
135
+
136
+
137
+ header = ('''
138
+ <div class="embed_hidden" style="text-align: center;">
139
+ <h1> <b>WiLoR</b>: End-to-end 3D hand localization and reconstruction in-the-wild</h1>
140
+ <h3>
141
+ <a href="https://rolpotamias.github.io" target="_blank" rel="noopener noreferrer">Rolandos Alexandros Potamias</a><sup>1</sup>,
142
+ <a href="" target="_blank" rel="noopener noreferrer">Jinglei Zhang</a><sup>2</sup>,
143
+ <br>
144
+ <a href="https://jiankangdeng.github.io/" target="_blank" rel="noopener noreferrer">Jiankang Deng</a><sup>1</sup>,
145
+ <a href="https://wp.doc.ic.ac.uk/szafeiri/" target="_blank" rel="noopener noreferrer">Stefanos Zafeiriou</a><sup>1</sup>
146
+ </h3>
147
+ <h3>
148
+ <sup>1</sup>Imperial College London;
149
+ <sup>2</sup>Shanghai Jiao Tong University
150
+ </h3>
151
+ </div>
152
+ <div style="display:flex; gap: 0.3rem; justify-content: center; align-items: center;" align="center">
153
+ <a href=''><img src='https://img.shields.io/badge/Arxiv-......-A42C25?style=flat&logo=arXiv&logoColor=A42C25'></a>
154
+ <a href='https://rolpotamias.github.io/pdfs/WiLoR.pdf'><img src='https://img.shields.io/badge/Paper-PDF-yellow?style=flat&logo=arXiv&logoColor=yellow'></a>
155
+ <a href='https://rolpotamias.github.io/WiLoR/'><img src='https://img.shields.io/badge/Project-Page-%23df5b46?style=flat&logo=Google%20chrome&logoColor=%23df5b46'></a>
156
+ <a href='https://github.com/rolpotamias/WiLoR'><img src='https://img.shields.io/badge/GitHub-Code-black?style=flat&logo=github&logoColor=white'></a>
157
+ ''')
158
+
159
+
160
+ with gr.Blocks(title="WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild", css=".gradio-container") as demo:
161
+
162
+ gr.Markdown(header)
163
+
164
+ with gr.Row():
165
+ with gr.Column():
166
+ input_image = gr.Image(label="Input image", type="numpy")
167
+ threshold = gr.Slider(value=0.3, minimum=0.05, maximum=0.95, step=0.05, label='Detection Confidence Threshold')
168
+ #nms = gr.Slider(value=0.5, minimum=0.05, maximum=0.95, step=0.05, label='IoU NMS Threshold')
169
+ submit = gr.Button("Submit", variant="primary")
170
+
171
+
172
+ with gr.Column():
173
+ reconstruction = gr.Image(label="Reconstructions", type="numpy")
174
+ hands_detected = gr.Textbox(label="Hands Detected")
175
+
176
+ submit.click(fn=render_reconstruction, inputs=[input_image, threshold], outputs=[reconstruction, hands_detected])
177
+
178
+ with gr.Row():
179
+ example_images = gr.Examples([
180
+
181
+ ['./demo_img/test1.jpg'],
182
+ ['./demo_img/test2.png'],
183
+ ['./demo_img/test3.jpg'],
184
+ ['./demo_img/test4.jpg'],
185
+ ['./demo_img/test5.jpeg'],
186
+ ['./demo_img/test6.jpg'],
187
+ ['./demo_img/test7.jpg'],
188
+ ['./demo_img/test8.jpg'],
189
+ ],
190
+ inputs=input_image)
191
+
192
+ demo.launch()
WiLoR/license.txt ADDED
@@ -0,0 +1,402 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Attribution-NonCommercial-NoDerivatives 4.0 International
2
+
3
+ =======================================================================
4
+
5
+ Creative Commons Corporation ("Creative Commons") is not a law firm and
6
+ does not provide legal services or legal advice. Distribution of
7
+ Creative Commons public licenses does not create a lawyer-client or
8
+ other relationship. Creative Commons makes its licenses and related
9
+ information available on an "as-is" basis. Creative Commons gives no
10
+ warranties regarding its licenses, any material licensed under their
11
+ terms and conditions, or any related information. Creative Commons
12
+ disclaims all liability for damages resulting from their use to the
13
+ fullest extent possible.
14
+
15
+ Using Creative Commons Public Licenses
16
+
17
+ Creative Commons public licenses provide a standard set of terms and
18
+ conditions that creators and other rights holders may use to share
19
+ original works of authorship and other material subject to copyright
20
+ and certain other rights specified in the public license below. The
21
+ following considerations are for informational purposes only, are not
22
+ exhaustive, and do not form part of our licenses.
23
+
24
+ Considerations for licensors: Our public licenses are
25
+ intended for use by those authorized to give the public
26
+ permission to use material in ways otherwise restricted by
27
+ copyright and certain other rights. Our licenses are
28
+ irrevocable. Licensors should read and understand the terms
29
+ and conditions of the license they choose before applying it.
30
+ Licensors should also secure all rights necessary before
31
+ applying our licenses so that the public can reuse the
32
+ material as expected. Licensors should clearly mark any
33
+ material not subject to the license. This includes other CC-
34
+ licensed material, or material used under an exception or
35
+ limitation to copyright. More considerations for licensors:
36
+ wiki.creativecommons.org/Considerations_for_licensors
37
+
38
+ Considerations for the public: By using one of our public
39
+ licenses, a licensor grants the public permission to use the
40
+ licensed material under specified terms and conditions. If
41
+ the licensor's permission is not necessary for any reason--for
42
+ example, because of any applicable exception or limitation to
43
+ copyright--then that use is not regulated by the license. Our
44
+ licenses grant only permissions under copyright and certain
45
+ other rights that a licensor has authority to grant. Use of
46
+ the licensed material may still be restricted for other
47
+ reasons, including because others have copyright or other
48
+ rights in the material. A licensor may make special requests,
49
+ such as asking that all changes be marked or described.
50
+ Although not required by our licenses, you are encouraged to
51
+ respect those requests where reasonable. More considerations
52
+ for the public:
53
+ wiki.creativecommons.org/Considerations_for_licensees
54
+
55
+ =======================================================================
56
+
57
+ Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
58
+ International Public License
59
+
60
+ By exercising the Licensed Rights (defined below), You accept and agree
61
+ to be bound by the terms and conditions of this Creative Commons
62
+ Attribution-NonCommercial-NoDerivatives 4.0 International Public
63
+ License ("Public License"). To the extent this Public License may be
64
+ interpreted as a contract, You are granted the Licensed Rights in
65
+ consideration of Your acceptance of these terms and conditions, and the
66
+ Licensor grants You such rights in consideration of benefits the
67
+ Licensor receives from making the Licensed Material available under
68
+ these terms and conditions.
69
+
70
+
71
+ Section 1 -- Definitions.
72
+
73
+ a. Adapted Material means material subject to Copyright and Similar
74
+ Rights that is derived from or based upon the Licensed Material
75
+ and in which the Licensed Material is translated, altered,
76
+ arranged, transformed, or otherwise modified in a manner requiring
77
+ permission under the Copyright and Similar Rights held by the
78
+ Licensor. For purposes of this Public License, where the Licensed
79
+ Material is a musical work, performance, or sound recording,
80
+ Adapted Material is always produced where the Licensed Material is
81
+ synched in timed relation with a moving image.
82
+
83
+ b. Copyright and Similar Rights means copyright and/or similar rights
84
+ closely related to copyright including, without limitation,
85
+ performance, broadcast, sound recording, and Sui Generis Database
86
+ Rights, without regard to how the rights are labeled or
87
+ categorized. For purposes of this Public License, the rights
88
+ specified in Section 2(b)(1)-(2) are not Copyright and Similar
89
+ Rights.
90
+
91
+ c. Effective Technological Measures means those measures that, in the
92
+ absence of proper authority, may not be circumvented under laws
93
+ fulfilling obligations under Article 11 of the WIPO Copyright
94
+ Treaty adopted on December 20, 1996, and/or similar international
95
+ agreements.
96
+
97
+ d. Exceptions and Limitations means fair use, fair dealing, and/or
98
+ any other exception or limitation to Copyright and Similar Rights
99
+ that applies to Your use of the Licensed Material.
100
+
101
+ e. Licensed Material means the artistic or literary work, database,
102
+ or other material to which the Licensor applied this Public
103
+ License.
104
+
105
+ f. Licensed Rights means the rights granted to You subject to the
106
+ terms and conditions of this Public License, which are limited to
107
+ all Copyright and Similar Rights that apply to Your use of the
108
+ Licensed Material and that the Licensor has authority to license.
109
+
110
+ g. Licensor means the individual(s) or entity(ies) granting rights
111
+ under this Public License.
112
+
113
+ h. NonCommercial means not primarily intended for or directed towards
114
+ commercial advantage or monetary compensation. For purposes of
115
+ this Public License, the exchange of the Licensed Material for
116
+ other material subject to Copyright and Similar Rights by digital
117
+ file-sharing or similar means is NonCommercial provided there is
118
+ no payment of monetary compensation in connection with the
119
+ exchange.
120
+
121
+ i. Share means to provide material to the public by any means or
122
+ process that requires permission under the Licensed Rights, such
123
+ as reproduction, public display, public performance, distribution,
124
+ dissemination, communication, or importation, and to make material
125
+ available to the public including in ways that members of the
126
+ public may access the material from a place and at a time
127
+ individually chosen by them.
128
+
129
+ j. Sui Generis Database Rights means rights other than copyright
130
+ resulting from Directive 96/9/EC of the European Parliament and of
131
+ the Council of 11 March 1996 on the legal protection of databases,
132
+ as amended and/or succeeded, as well as other essentially
133
+ equivalent rights anywhere in the world.
134
+
135
+ k. You means the individual or entity exercising the Licensed Rights
136
+ under this Public License. Your has a corresponding meaning.
137
+
138
+
139
+ Section 2 -- Scope.
140
+
141
+ a. License grant.
142
+
143
+ 1. Subject to the terms and conditions of this Public License,
144
+ the Licensor hereby grants You a worldwide, royalty-free,
145
+ non-sublicensable, non-exclusive, irrevocable license to
146
+ exercise the Licensed Rights in the Licensed Material to:
147
+
148
+ a. reproduce and Share the Licensed Material, in whole or
149
+ in part, for NonCommercial purposes only; and
150
+
151
+ b. produce and reproduce, but not Share, Adapted Material
152
+ for NonCommercial purposes only.
153
+
154
+ 2. Exceptions and Limitations. For the avoidance of doubt, where
155
+ Exceptions and Limitations apply to Your use, this Public
156
+ License does not apply, and You do not need to comply with
157
+ its terms and conditions.
158
+
159
+ 3. Term. The term of this Public License is specified in Section
160
+ 6(a).
161
+
162
+ 4. Media and formats; technical modifications allowed. The
163
+ Licensor authorizes You to exercise the Licensed Rights in
164
+ all media and formats whether now known or hereafter created,
165
+ and to make technical modifications necessary to do so. The
166
+ Licensor waives and/or agrees not to assert any right or
167
+ authority to forbid You from making technical modifications
168
+ necessary to exercise the Licensed Rights, including
169
+ technical modifications necessary to circumvent Effective
170
+ Technological Measures. For purposes of this Public License,
171
+ simply making modifications authorized by this Section 2(a)
172
+ (4) never produces Adapted Material.
173
+
174
+ 5. Downstream recipients.
175
+
176
+ a. Offer from the Licensor -- Licensed Material. Every
177
+ recipient of the Licensed Material automatically
178
+ receives an offer from the Licensor to exercise the
179
+ Licensed Rights under the terms and conditions of this
180
+ Public License.
181
+
182
+ b. No downstream restrictions. You may not offer or impose
183
+ any additional or different terms or conditions on, or
184
+ apply any Effective Technological Measures to, the
185
+ Licensed Material if doing so restricts exercise of the
186
+ Licensed Rights by any recipient of the Licensed
187
+ Material.
188
+
189
+ 6. No endorsement. Nothing in this Public License constitutes or
190
+ may be construed as permission to assert or imply that You
191
+ are, or that Your use of the Licensed Material is, connected
192
+ with, or sponsored, endorsed, or granted official status by,
193
+ the Licensor or others designated to receive attribution as
194
+ provided in Section 3(a)(1)(A)(i).
195
+
196
+ b. Other rights.
197
+
198
+ 1. Moral rights, such as the right of integrity, are not
199
+ licensed under this Public License, nor are publicity,
200
+ privacy, and/or other similar personality rights; however, to
201
+ the extent possible, the Licensor waives and/or agrees not to
202
+ assert any such rights held by the Licensor to the limited
203
+ extent necessary to allow You to exercise the Licensed
204
+ Rights, but not otherwise.
205
+
206
+ 2. Patent and trademark rights are not licensed under this
207
+ Public License.
208
+
209
+ 3. To the extent possible, the Licensor waives any right to
210
+ collect royalties from You for the exercise of the Licensed
211
+ Rights, whether directly or through a collecting society
212
+ under any voluntary or waivable statutory or compulsory
213
+ licensing scheme. In all other cases the Licensor expressly
214
+ reserves any right to collect such royalties, including when
215
+ the Licensed Material is used other than for NonCommercial
216
+ purposes.
217
+
218
+
219
+ Section 3 -- License Conditions.
220
+
221
+ Your exercise of the Licensed Rights is expressly made subject to the
222
+ following conditions.
223
+
224
+ a. Attribution.
225
+
226
+ 1. If You Share the Licensed Material, You must:
227
+
228
+ a. retain the following if it is supplied by the Licensor
229
+ with the Licensed Material:
230
+
231
+ i. identification of the creator(s) of the Licensed
232
+ Material and any others designated to receive
233
+ attribution, in any reasonable manner requested by
234
+ the Licensor (including by pseudonym if
235
+ designated);
236
+
237
+ ii. a copyright notice;
238
+
239
+ iii. a notice that refers to this Public License;
240
+
241
+ iv. a notice that refers to the disclaimer of
242
+ warranties;
243
+
244
+ v. a URI or hyperlink to the Licensed Material to the
245
+ extent reasonably practicable;
246
+
247
+ b. indicate if You modified the Licensed Material and
248
+ retain an indication of any previous modifications; and
249
+
250
+ c. indicate the Licensed Material is licensed under this
251
+ Public License, and include the text of, or the URI or
252
+ hyperlink to, this Public License.
253
+
254
+ For the avoidance of doubt, You do not have permission under
255
+ this Public License to Share Adapted Material.
256
+
257
+ 2. You may satisfy the conditions in Section 3(a)(1) in any
258
+ reasonable manner based on the medium, means, and context in
259
+ which You Share the Licensed Material. For example, it may be
260
+ reasonable to satisfy the conditions by providing a URI or
261
+ hyperlink to a resource that includes the required
262
+ information.
263
+
264
+ 3. If requested by the Licensor, You must remove any of the
265
+ information required by Section 3(a)(1)(A) to the extent
266
+ reasonably practicable.
267
+
268
+
269
+ Section 4 -- Sui Generis Database Rights.
270
+
271
+ Where the Licensed Rights include Sui Generis Database Rights that
272
+ apply to Your use of the Licensed Material:
273
+
274
+ a. for the avoidance of doubt, Section 2(a)(1) grants You the right
275
+ to extract, reuse, reproduce, and Share all or a substantial
276
+ portion of the contents of the database for NonCommercial purposes
277
+ only and provided You do not Share Adapted Material;
278
+
279
+ b. if You include all or a substantial portion of the database
280
+ contents in a database in which You have Sui Generis Database
281
+ Rights, then the database in which You have Sui Generis Database
282
+ Rights (but not its individual contents) is Adapted Material; and
283
+
284
+ c. You must comply with the conditions in Section 3(a) if You Share
285
+ all or a substantial portion of the contents of the database.
286
+
287
+ For the avoidance of doubt, this Section 4 supplements and does not
288
+ replace Your obligations under this Public License where the Licensed
289
+ Rights include other Copyright and Similar Rights.
290
+
291
+
292
+ Section 5 -- Disclaimer of Warranties and Limitation of Liability.
293
+
294
+ a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
295
+ EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
296
+ AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
297
+ ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
298
+ IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
299
+ WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
300
+ PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
301
+ ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
302
+ KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
303
+ ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
304
+
305
+ b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
306
+ TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
307
+ NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
308
+ INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
309
+ COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
310
+ USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
311
+ ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
312
+ DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
313
+ IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
314
+
315
+ c. The disclaimer of warranties and limitation of liability provided
316
+ above shall be interpreted in a manner that, to the extent
317
+ possible, most closely approximates an absolute disclaimer and
318
+ waiver of all liability.
319
+
320
+
321
+ Section 6 -- Term and Termination.
322
+
323
+ a. This Public License applies for the term of the Copyright and
324
+ Similar Rights licensed here. However, if You fail to comply with
325
+ this Public License, then Your rights under this Public License
326
+ terminate automatically.
327
+
328
+ b. Where Your right to use the Licensed Material has terminated under
329
+ Section 6(a), it reinstates:
330
+
331
+ 1. automatically as of the date the violation is cured, provided
332
+ it is cured within 30 days of Your discovery of the
333
+ violation; or
334
+
335
+ 2. upon express reinstatement by the Licensor.
336
+
337
+ For the avoidance of doubt, this Section 6(b) does not affect any
338
+ right the Licensor may have to seek remedies for Your violations
339
+ of this Public License.
340
+
341
+ c. For the avoidance of doubt, the Licensor may also offer the
342
+ Licensed Material under separate terms or conditions or stop
343
+ distributing the Licensed Material at any time; however, doing so
344
+ will not terminate this Public License.
345
+
346
+ d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
347
+ License.
348
+
349
+
350
+ Section 7 -- Other Terms and Conditions.
351
+
352
+ a. The Licensor shall not be bound by any additional or different
353
+ terms or conditions communicated by You unless expressly agreed.
354
+
355
+ b. Any arrangements, understandings, or agreements regarding the
356
+ Licensed Material not stated herein are separate from and
357
+ independent of the terms and conditions of this Public License.
358
+
359
+
360
+ Section 8 -- Interpretation.
361
+
362
+ a. For the avoidance of doubt, this Public License does not, and
363
+ shall not be interpreted to, reduce, limit, restrict, or impose
364
+ conditions on any use of the Licensed Material that could lawfully
365
+ be made without permission under this Public License.
366
+
367
+ b. To the extent possible, if any provision of this Public License is
368
+ deemed unenforceable, it shall be automatically reformed to the
369
+ minimum extent necessary to make it enforceable. If the provision
370
+ cannot be reformed, it shall be severed from this Public License
371
+ without affecting the enforceability of the remaining terms and
372
+ conditions.
373
+
374
+ c. No term or condition of this Public License will be waived and no
375
+ failure to comply consented to unless expressly agreed to by the
376
+ Licensor.
377
+
378
+ d. Nothing in this Public License constitutes or may be interpreted
379
+ as a limitation upon, or waiver of, any privileges and immunities
380
+ that apply to the Licensor or You, including from the legal
381
+ processes of any jurisdiction or authority.
382
+
383
+ =======================================================================
384
+
385
+ Creative Commons is not a party to its public
386
+ licenses. Notwithstanding, Creative Commons may elect to apply one of
387
+ its public licenses to material it publishes and in those instances
388
+ will be considered the “Licensor.” The text of the Creative Commons
389
+ public licenses is dedicated to the public domain under the CC0 Public
390
+ Domain Dedication. Except for the limited purpose of indicating that
391
+ material is shared under a Creative Commons public license or as
392
+ otherwise permitted by the Creative Commons policies published at
393
+ creativecommons.org/policies, Creative Commons does not authorize the
394
+ use of the trademark "Creative Commons" or any other trademark or logo
395
+ of Creative Commons without its prior written consent including,
396
+ without limitation, in connection with any unauthorized modifications
397
+ to any of its public licenses or any other arrangements,
398
+ understandings, or agreements concerning use of licensed material. For
399
+ the avoidance of doubt, this paragraph does not form part of the
400
+ public licenses.
401
+
402
+ Creative Commons may be contacted at creativecommons.org.
WiLoR/requirements.txt ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ numpy
2
+ opencv-python
3
+ pyrender
4
+ pytorch-lightning
5
+ scikit-image
6
+ smplx==0.1.28
7
+ yacs
8
+ chumpy @ git+https://github.com/mattloper/chumpy
9
+ timm
10
+ einops
11
+ xtcocotools
12
+ pandas
13
+ hydra-core
14
+ hydra-submitit-launcher
15
+ hydra-colorlog
16
+ pyrootutils
17
+ rich
18
+ webdataset
19
+ gradio
20
+ ultralytics==8.1.34
WiLoR/requirements_my.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pytorch-lightning
2
+ scikit-image
3
+ yacs
4
+ xtcocotools
5
+ hydra-core
6
+ hydra-submitit-launcher
7
+ hydra-colorlog
8
+ pyrootutils
9
+ rich
10
+ webdataset
11
+ gradio
__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ def deprecated_api_warning(name, cls=None):
3
+ def decorator(func):
4
+ return func
5
+ return decorator
6
+
7
+
8
+ def deprecated_api_warning(name, cls=None):
9
+ def decorator(func):
10
+ return func
11
+ return decorator
convert_img_to_videos.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import cv2
3
+ from pathlib import Path
4
+ from tqdm import tqdm
5
+
6
+ def create_video_from_images(image_folder, output_video_path, fps=30):
7
+ """
8
+ Create a video from a folder of images
9
+
10
+ Args:
11
+ image_folder: Path to folder containing images
12
+ output_video_path: Path where video will be saved
13
+ fps: Frames per second for the output video
14
+ """
15
+ # Get all image files and sort them
16
+ image_files = sorted([f for f in os.listdir(image_folder) if f.endswith(('.jpg', '.png', '.jpeg'))])
17
+
18
+ if not image_files:
19
+ print(f"No images found in {image_folder}")
20
+ return False
21
+
22
+ # Read first image to get dimensions
23
+ first_image_path = os.path.join(image_folder, image_files[0])
24
+ first_frame = cv2.imread(first_image_path)
25
+
26
+ if first_frame is None:
27
+ print(f"Could not read {first_image_path}")
28
+ return False
29
+
30
+ height, width, channels = first_frame.shape
31
+
32
+ # Define the codec and create VideoWriter object
33
+ fourcc = cv2.VideoWriter_fourcc(*'mp4v') # or 'XVID' for .avi
34
+ out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))
35
+
36
+ # Write each frame
37
+ for image_file in image_files:
38
+ image_path = os.path.join(image_folder, image_file)
39
+ frame = cv2.imread(image_path)
40
+
41
+ if frame is not None:
42
+ out.write(frame)
43
+ else:
44
+ print(f"Warning: Could not read {image_path}")
45
+
46
+ # Release the video writer
47
+ out.release()
48
+ print(f"Video saved to {output_video_path}")
49
+ return True
50
+
51
+ def convert_all_folders_to_videos(input_base_dir, output_base_dir, fps=30):
52
+ """
53
+ Convert all image folders to videos
54
+
55
+ Args:
56
+ input_base_dir: Base directory containing image folders
57
+ output_base_dir: Base directory where videos will be saved
58
+ fps: Frames per second for output videos
59
+ """
60
+ input_path = Path(input_base_dir)
61
+ output_path = Path(output_base_dir)
62
+
63
+ # Create output directory if it doesn't exist
64
+ output_path.mkdir(parents=True, exist_ok=True)
65
+
66
+ # Get all subdirectories in the input directory
67
+ folders = [f for f in input_path.iterdir() if f.is_dir()]
68
+
69
+ print(f"Found {len(folders)} folders to convert")
70
+
71
+ # Process each folder
72
+ for folder in tqdm(folders, desc="Converting folders to videos"):
73
+ folder_name = folder.name
74
+ output_video_path = output_path / f"{folder_name}.mp4"
75
+
76
+ print(f"\nProcessing: {folder_name}")
77
+ create_video_from_images(str(folder), str(output_video_path), fps=fps)
78
+
79
+ print(f"\n✓ All videos saved to {output_base_dir}")
80
+
81
+ if __name__ == "__main__":
82
+ # Set your paths
83
+ input_base_dir = "/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_Daily/rgb_format/frames_512x512"
84
+ output_base_dir = "/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_Daily/rgb_format/videos_512x512_30fps"
85
+
86
+ fps = 30
87
+
88
+ print("Starting conversion...")
89
+ convert_all_folders_to_videos(input_base_dir, output_base_dir, fps=fps)
90
+ print("Done!")
corrupted_videos.log ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2 weeks for cslnews
2
+
3
+ /mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20220416_22737-23037_691030.mp4
4
+ /mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20210613_37012-37137_564952.mp4
5
+ /mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20240121_23537-23987_637834.mp4
6
+
7
+ 1 hr for csldaily
corrupted_videos_csl_news.log ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2 weeks for csl news
2
+
3
+
4
+
5
+ /mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20220416_22737-23037_691030.mp4
6
+ /mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20210613_37012-37137_564952.mp4
7
+ /mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_News/rgb_format/Common-Concerns_20240121_23537-23987_637834.mp4
extract_smplx_20260212_165824.log ADDED
The diff for this file is too large to render. See raw diff
 
extract_smplx_20260212_165911_gpu_monitor.log ADDED
The diff for this file is too large to render. See raw diff
 
extract_smplx_20260213_144424.log ADDED
The diff for this file is too large to render. See raw diff
 
extract_smplx_20260213_144424_gpu_monitor.log ADDED
The diff for this file is too large to render. See raw diff
 
extract_smplx_pose.py ADDED
@@ -0,0 +1,657 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import warnings
2
+ import os
3
+ import sys
4
+ import argparse
5
+
6
+ import torch
7
+ import cv2
8
+ import pickle
9
+ import smplx
10
+ import numpy as np
11
+ import time
12
+ import queue
13
+ import threading
14
+ import torch.nn.functional as F
15
+ from tqdm import tqdm
16
+ from torchvision import transforms
17
+ from ultralytics import YOLO
18
+ from accelerate import Accelerator
19
+ from accelerate.utils import set_seed
20
+ from concurrent.futures import ThreadPoolExecutor
21
+ import decord
22
+ from decord import VideoReader, gpu
23
+
24
+ torch.set_float32_matmul_precision('high')
25
+ torch.backends.cuda.matmul.allow_tf32 = True
26
+ torch.backends.cudnn.allow_tf32 = True
27
+ torch.backends.cudnn.benchmark = True
28
+ torch._inductor.config.triton.cudagraph_skip_dynamic_graphs = True
29
+ warnings.filterwarnings("ignore")
30
+ torch._inductor.config.triton.cudagraph_skip_dynamic_graphs = True
31
+
32
+ import logging
33
+ logging.getLogger("torch.utils._sympy.interp").setLevel(logging.ERROR)
34
+ logging.getLogger("torch._inductor.utils").setLevel(logging.ERROR)
35
+
36
+ PROJECT_ROOT = os.path.abspath(r"/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction")
37
+ SMPLEST_X_PATH = os.path.join(PROJECT_ROOT, 'SMPLest-X')
38
+ WILOR_PATH = os.path.join(PROJECT_ROOT, 'WiLoR')
39
+ DEPTH_ANYTHING_PATH = os.path.join(PROJECT_ROOT, 'Depth-Anything-V2')
40
+ MODEL_PATH = os.path.join(PROJECT_ROOT, "pretrained_weight", "smpl_models")
41
+
42
+ for p in [SMPLEST_X_PATH, WILOR_PATH, DEPTH_ANYTHING_PATH, PROJECT_ROOT]:
43
+ if p not in sys.path: sys.path.insert(0, p)
44
+
45
+ for attr in ['int', 'float', 'bool', 'complex', 'object', 'unicode', 'str']:
46
+ if not hasattr(np, attr): setattr(np, attr, eval(attr) if attr != 'unicode' else str)
47
+
48
+ from main.config import Config as SmplestConfig
49
+ from main.base import Tester as SmplestTester
50
+ from human_models.human_models import SMPLX
51
+ from utils.data_utils import process_bbox, generate_patch_image
52
+ from wilor.models import load_wilor
53
+ from wilor.utils import recursive_to
54
+ from wilor.datasets.vitdet_dataset import ViTDetDataset
55
+ from depth_anything_v2.dpt import DepthAnythingV2
56
+ from basicsr.archs.rrdbnet_arch import RRDBNet
57
+ from realesrgan.utils import RealESRGANer
58
+
59
+ class FramePrefetcher:
60
+ def __init__(self, video_path, device_id=0, buffer_size=128):
61
+ try:
62
+ self.vr = VideoReader(video_path, ctx=gpu(device_id))
63
+ except:
64
+ self.vr = VideoReader(video_path, ctx=decord.cpu(0))
65
+
66
+ self.total_frames = len(self.vr)
67
+ self.current_idx = 0
68
+ self.buffer_size = buffer_size
69
+ self.queue = queue.Queue(maxsize=buffer_size)
70
+ self.stopped = False
71
+
72
+ def start(self):
73
+ t = threading.Thread(target=self._update, args=())
74
+ t.daemon = True
75
+ t.start()
76
+ return self
77
+
78
+ def _update(self):
79
+ while not self.stopped:
80
+ if self.current_idx >= self.total_frames:
81
+ self.stopped = True
82
+ break
83
+
84
+ if not self.queue.full():
85
+ end_idx = min(self.current_idx + 16, self.total_frames)
86
+ frames = self.vr.get_batch(range(self.current_idx, end_idx)).asnumpy()
87
+
88
+ for i in range(frames.shape[0]):
89
+ frame = cv2.cvtColor(frames[i], cv2.COLOR_RGB2BGR)
90
+ self.queue.put(frame)
91
+
92
+ self.current_idx = end_idx
93
+ else:
94
+ time.sleep(0.005)
95
+
96
+ def get_batch(self, batch_size):
97
+ batch = []
98
+ for _ in range(batch_size):
99
+ try:
100
+ frame = self.queue.get(timeout=0.05)
101
+ batch.append(frame)
102
+ except queue.Empty:
103
+ break
104
+ return batch
105
+
106
+ def is_running(self):
107
+ return not (self.stopped and self.queue.empty())
108
+
109
+ def stop(self):
110
+ self.stopped = True
111
+
112
+ class GlobalSilence:
113
+ def __enter__(self):
114
+ self.stdout_fd = sys.stdout.fileno()
115
+ self.saved_stdout_fd = os.dup(self.stdout_fd)
116
+ os.dup2(os.open(os.devnull, os.O_WRONLY), self.stdout_fd)
117
+ def __exit__(self, type, value, traceback):
118
+ os.dup2(self.saved_stdout_fd, self.stdout_fd)
119
+ os.close(self.saved_stdout_fd)
120
+
121
+ class SMPLXPoseExtractor:
122
+ def __init__(self, args, accelerator):
123
+ self.args = args
124
+ self.accelerator = accelerator
125
+ self.device = accelerator.device
126
+ self.global_pbar = None
127
+ self.files_done = 0
128
+ self.my_total = 0
129
+ self.start_time = 0
130
+ self.pool = ThreadPoolExecutor(max_workers=self.args.num_workers)
131
+
132
+ if self.accelerator.is_main_process:
133
+ print(f"Initializing SMPL-X Pose Extractor on {self.device}...")
134
+
135
+ # 1. Load Real-ESRGAN
136
+ if self.args.apply_sr:
137
+ model_esrgan = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
138
+ self.upsampler = RealESRGANer(scale=4, model_path=args.real_esrgan_ckpt, model=model_esrgan, tile=1024, tile_pad=10, pre_pad=0, half=True, device=self.device)
139
+ if self.accelerator.is_main_process: print("[1/6] Real-ESRGAN loaded.")
140
+ else:
141
+ self.upsampler = None
142
+ if self.accelerator.is_main_process: print("[1/6] Real-ESRGAN skipped.")
143
+
144
+ # 2. Load Depth Anything V2
145
+ if self.args.opt_depth:
146
+ self.depth_model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])
147
+ self.depth_model.load_state_dict(torch.load(args.depth_anything_v2_ckpt, map_location='cpu'))
148
+ self.depth_model = self.depth_model.to(self.device).eval()
149
+ if hasattr(torch, 'compile'):
150
+ self.depth_model = torch.compile(self.depth_model, mode="reduce-overhead")
151
+ if self.accelerator.is_main_process: print("[2/6] Compiled Depth-Anything-V2 loaded for optimization.")
152
+ else:
153
+ if self.accelerator.is_main_process: print("[2/6] Compiled Depth-Anything-V2 loaded for optimization.")
154
+ else:
155
+ self.depth_model = None
156
+ if self.accelerator.is_main_process: print("[2/6] Depth optimization skipped.")
157
+
158
+ # 3. Load YOLO Detectors
159
+ self.detector = YOLO(args.yolo_ckpt)
160
+ self.hand_detector = YOLO(args.hand_detector_ckpt)
161
+ if self.accelerator.is_main_process: print("[3/6] YOLO (Body & Hand) Detectors loaded successfully.")
162
+
163
+ # 4. SMPLest
164
+ self.smplest_cfg = SmplestConfig.load_config(os.path.join(PROJECT_ROOT, "pretrained_weight", "smplest-x", "config_base.py"))
165
+ log_base = os.path.join(PROJECT_ROOT, "smplest_logs")
166
+ if self.accelerator.is_main_process: os.makedirs(log_base, exist_ok=True)
167
+ self.accelerator.wait_for_everyone()
168
+ self.smplest_cfg.log.log_dir = os.path.join(log_base, f"rank_{self.accelerator.process_index}")
169
+ os.makedirs(self.smplest_cfg.log.log_dir, exist_ok=True)
170
+
171
+ _tmp_wrapper = SMPLX(MODEL_PATH)
172
+ self.smplx_model = _tmp_wrapper.layer['neutral'].to(self.device)
173
+ self.smplest_tester = SmplestTester(self.smplest_cfg)
174
+ self.smplest_tester._make_model()
175
+ self.smplest_model = self.smplest_tester.model.to(self.device)
176
+ if isinstance(self.smplest_model, torch.nn.DataParallel): self.smplest_model = self.smplest_model.module
177
+ self.smplest_model = self.smplest_model.to(self.device).eval()
178
+ if hasattr(torch, 'compile'):
179
+ if self.accelerator.is_main_process: print("[4/6] Compiled SMPLest-X model loaded successfully.")
180
+ self.smplest_model = torch.compile(self.smplest_model, mode="reduce-overhead")
181
+ else:
182
+ if self.accelerator.is_main_process: print("[4/6] SMPLest-X model loaded successfully.")
183
+
184
+ # 5. WiLoR
185
+ self.wilor_model, self.wilor_cfg = load_wilor(args.wilor_ckpt, cfg_path=os.path.join(PROJECT_ROOT, 'WiLoR', 'pretrained_models', 'model_config.yaml'))
186
+ self.wilor_model = self.wilor_model.to(self.device).eval()
187
+ self.transform = transforms.ToTensor()
188
+ if hasattr(torch, 'compile'):
189
+ self.wilor_model = torch.compile(self.wilor_model, mode="reduce-overhead")
190
+ if self.accelerator.is_main_process: print("[5/6] Compiled WiLoR Hand model loaded successfully.")
191
+ else:
192
+ if self.accelerator.is_main_process: print("[5/6] WiLoR Hand model loaded successfully.")
193
+
194
+ # 6. Optimization Layer
195
+ self.smplx_opt = smplx.create(MODEL_PATH, model_type='smplx', gender='neutral', use_pca=False, batch_size=1).to(self.device)
196
+ self.smpl_mean_r = self.smplx_model.right_hand_mean.detach().to(torch.float32).cpu().numpy().flatten()
197
+ self.smpl_mean_l = self.smplx_model.left_hand_mean.detach().to(torch.float32).cpu().numpy().flatten()
198
+ if self.accelerator.is_main_process: print("[6/6] SMPL-X Optimization Layer initialized.\n")
199
+
200
+ def _matrix_to_axis_angle(self, matrix_tensor):
201
+ mats = matrix_tensor.detach().to(torch.float32).cpu().numpy()
202
+ if mats.ndim == 2: mats = mats[None]
203
+ return np.concatenate([cv2.Rodrigues(mats[i])[0].flatten() for i in range(mats.shape[0])])
204
+
205
+ def _batch_refine_fingers(self, hand_poses, hand_bboxes, depths_gpu, joints_2d_all):
206
+ N_hp = hand_poses.shape[0]
207
+ N_jt = joints_2d_all.shape[0]
208
+
209
+ # Align and remove padding
210
+ N = N_jt
211
+ if N == 0: return hand_poses
212
+ if N_hp != N:
213
+ hand_poses = hand_poses[:N]
214
+ hand_bboxes = hand_bboxes[:N]
215
+
216
+ # 1. Get BBox
217
+ x1, y1 = hand_bboxes[:, 0:1], hand_bboxes[:, 1:2]
218
+ w_crop, h_crop = hand_bboxes[:, 2:3] - x1, hand_bboxes[:, 3:4] - y1
219
+
220
+ # 2. Get 2d
221
+ joints_img = joints_2d_all.clone()
222
+ joints_img[:, :, 0] = joints_img[:, :, 0] * w_crop + x1
223
+ joints_img[:, :, 1] = joints_img[:, :, 1] * h_crop + y1
224
+
225
+ # 3. Get depth
226
+ img_h, img_w = depths_gpu.shape[-2:]
227
+ u = joints_img[:, :, 0].long().clamp(0, img_w - 1)
228
+ v = joints_img[:, :, 1].long().clamp(0, img_h - 1)
229
+
230
+ hand_idx = torch.arange(N, device=hand_poses.device).view(-1, 1)
231
+ depth_values = depths_gpu[hand_idx, v, u]
232
+
233
+ # 4. compute to refine Z
234
+ palm_indices = [0, 5, 9, 13, 17]
235
+ palm_depth = depth_values[:, palm_indices].mean(dim=1, keepdim=True)
236
+ finger_depths = depth_values[:, 5:20]
237
+ z_mod = ((finger_depths - palm_depth) / (palm_depth + 1e-6)).clamp(-0.2, 0.2) + 1.0
238
+
239
+ D = hand_poses.shape[1] // 15
240
+ refined_pose = hand_poses.view(N, 15, D).clone()
241
+ refined_pose *= z_mod.unsqueeze(-1)
242
+
243
+ return refined_pose.view(N, -1)
244
+
245
+ def _extract_body_patch(self, i, img, body_res):
246
+ if not body_res.boxes:
247
+ return i, None
248
+ h_img, w_img = img.shape[:2]
249
+ bbox = body_res.boxes.xyxy[0].to(torch.float32).cpu().numpy()
250
+ proc_bbox = process_bbox(np.array([bbox[0], bbox[1], bbox[2]-bbox[0], bbox[3]-bbox[1]]),
251
+ w_img, h_img, self.smplest_cfg.model.input_img_shape)
252
+ patch_img, _, _ = generate_patch_image(img, proc_bbox, 1.0, 0.0, False, self.smplest_cfg.model.input_img_shape)
253
+ p_tensor = self.transform(cv2.cvtColor(patch_img, cv2.COLOR_BGR2RGB).astype(np.float32)) / 255.0
254
+ return i, p_tensor
255
+
256
+ def _extract_hand_patches(self, i, img, hand_res):
257
+ patches = []
258
+ if not hand_res.boxes:
259
+ return i, patches
260
+ for box in hand_res.boxes:
261
+ is_right = box.cls.cpu().item()
262
+ bbox = box.xyxy[0].to(torch.float32).cpu().numpy()
263
+ patch_tensor = self._get_wilor_patch(img, bbox, is_right)
264
+ patches.append({
265
+ 'tensor': patch_tensor,
266
+ 'meta': {'batch_idx': i, 'is_right': is_right, 'bbox': bbox}
267
+ })
268
+ return i, patches
269
+
270
+ def _get_wilor_patch(self, img, bbox, is_right):
271
+ h_img, w_img = img.shape[:2]
272
+ x1, y1, x2, y2 = bbox
273
+
274
+ # 1. Compute the center point and the original width and height
275
+ center = np.array([(x1 + x2) / 2.0, (y1 + y2) / 2.0])
276
+ width = x2 - x1
277
+ height = y2 - y1
278
+
279
+ # 2. Rescale factor (WiLoR / ViTDet typically uses 2.0)
280
+ # This mimics the behavior of ViTDetDataset
281
+ rescale_factor = 2.0
282
+ side = max(width, height) * rescale_factor
283
+
284
+ # 3. Compute the cropping boundaries
285
+ # Ensure the crop is square and stays within image bounds
286
+ new_x1 = max(0, int(center[0] - side / 2.0))
287
+ new_y1 = max(0, int(center[1] - side / 2.0))
288
+ new_x2 = min(w_img, int(center[0] + side / 2.0))
289
+ new_y2 = min(h_img, int(center[1] + side / 2.0))
290
+
291
+ # 4. Crop the image and pad it to a square
292
+ patch = img[new_y1:new_y2, new_x1:new_x2]
293
+
294
+ # If the cropped patch is not square (e.g., near image borders), pad it
295
+ ph, pw = patch.shape[:2]
296
+ if ph != pw:
297
+ max_side = max(ph, pw)
298
+ # Create a black background
299
+ tmp_patch = np.zeros((max_side, max_side, 3), dtype=np.uint8)
300
+ # Paste the patch into the center
301
+ start_y = (max_side - ph) // 2
302
+ start_x = (max_side - pw) // 2
303
+ tmp_patch[start_y:start_y+ph, start_x:start_x+pw] = patch
304
+ patch = tmp_patch
305
+
306
+ # 5. Resize to the model input resolution
307
+ # (assumed to be 224x224; adjust according to wilor_cfg)
308
+ input_size = self.wilor_cfg.MODEL.IMAGE_SIZE
309
+ patch_rgb = cv2.cvtColor(patch, cv2.COLOR_BGR2RGB)
310
+ patch_resized = cv2.resize(patch_rgb, (input_size, input_size), interpolation=cv2.INTER_LINEAR)
311
+
312
+ patch_tensor = torch.from_numpy(patch_resized).float().permute(2, 0, 1) / 255.0
313
+
314
+ return patch_tensor
315
+
316
+ def pad_to_fixed_buckets(self, tensor, buckets=[32, 64, 128, 256, 512]):
317
+ n = tensor.shape[0]
318
+ target_n = n
319
+ for b in buckets:
320
+ if n <= b:
321
+ target_n = b
322
+ break
323
+ else:
324
+ target_n = ((n + 7) // 8) * 8
325
+
326
+ if target_n == n:
327
+ return tensor, n
328
+
329
+ pad_size = target_n - n
330
+ padding = torch.zeros((pad_size, *tensor.shape[1:]), device=tensor.device, dtype=tensor.dtype)
331
+ return torch.cat([tensor, padding], dim=0), n
332
+
333
+ def _process_batch(self, batch_imgs):
334
+ batch_results_data = []
335
+ with torch.cuda.amp.autocast(dtype=torch.bfloat16):
336
+ # =========================================================
337
+ # STAGE 0: Image enhancement and depth preprocessing
338
+ # =========================================================
339
+ if self.args.apply_sr and self.upsampler:
340
+ processed_imgs = []
341
+ with GlobalSilence():
342
+ for img in batch_imgs:
343
+ sr_img, _ = self.upsampler.enhance(img, outscale=3); processed_imgs.append(sr_img)
344
+ else:
345
+ imgs_np = np.stack(batch_imgs)
346
+ processed_imgs = batch_imgs
347
+
348
+ imgs_np = np.stack(processed_imgs)
349
+ imgs_gpu = torch.from_numpy(imgs_np).to(self.device, non_blocking=True).float() / 255.0
350
+ imgs_gpu = imgs_gpu.permute(0, 3, 1, 2) # [B, 3, H, W]
351
+ h_orig, w_orig = imgs_gpu.shape[2:]
352
+
353
+ new_h = ((h_orig + 223) // 224) * 224
354
+ new_w = ((w_orig + 223) // 224) * 224
355
+ if h_orig != new_h or w_orig != new_w:
356
+ imgs_gpu = F.interpolate(imgs_gpu, size=(new_h, new_w), mode='bilinear', align_corners=False)
357
+
358
+ batch_depths_np = None
359
+ if self.args.opt_depth:
360
+ depth_input = imgs_gpu.to(memory_format=torch.contiguous_format).contiguous()
361
+ depths = self.depth_model(depth_input)
362
+ if depths.shape[-2:] != (h_orig, w_orig):
363
+ depths = F.interpolate(depths.unsqueeze(1), size=(h_orig, w_orig), mode='bilinear').squeeze(1)
364
+ batch_depths_np = depths.detach().cpu().numpy()
365
+
366
+ # =========================================================
367
+ # STAGE 1: Batched object detection and patch collection
368
+ # =========================================================
369
+ body_results = self.detector.predict(processed_imgs, device=self.device, conf=0.5, classes=0, verbose=False)
370
+ hand_results = self.hand_detector.predict(processed_imgs, device=self.device, conf=0.3, verbose=False)
371
+
372
+ # =========================================================
373
+ # STAGE 2: Aggregated SMPLest batch inference (key optimization)
374
+ # =========================================================
375
+ # 1. Use thread pool to process body patches in parallel
376
+ body_tasks = [self.pool.submit(self._extract_body_patch, i, processed_imgs[i], body_results[i])
377
+ for i in range(len(processed_imgs))]
378
+
379
+ # 2. Use thread pool to process hand patches in parallel
380
+ hand_tasks = [self.pool.submit(self._extract_hand_patches, i, processed_imgs[i], hand_results[i])
381
+ for i in range(len(processed_imgs))]
382
+
383
+ # 3. Collect body results
384
+ smpl_patch_tensors, original_to_agg_idx = [], {}
385
+ for task in body_tasks:
386
+ i, p_tensor = task.result()
387
+ if p_tensor is not None:
388
+ original_to_agg_idx[i] = len(smpl_patch_tensors)
389
+ smpl_patch_tensors.append(p_tensor)
390
+
391
+ # 4. Collect hand results
392
+ all_hand_patches, hand_meta_map = [], []
393
+ for task in hand_tasks:
394
+ _, patches = task.result()
395
+ for p_data in patches:
396
+ all_hand_patches.append(p_data['tensor'])
397
+ hand_meta_map.append(p_data['meta'])
398
+
399
+ batch_out_smpl = None
400
+ if smpl_patch_tensors:
401
+ agg_smpl_tensor = torch.stack(smpl_patch_tensors).to(self.device, dtype=torch.bfloat16)#.half()
402
+ padded_tensor, actual_num = self.pad_to_fixed_buckets(agg_smpl_tensor)
403
+ with torch.no_grad():
404
+ raw_smpl = self.smplest_model({'img': padded_tensor}, {}, {}, 'test')
405
+ batch_out_smpl = {k: v[:actual_num] if isinstance(v, torch.Tensor) else v for k, v in raw_smpl.items()}
406
+
407
+ wilor_out_all = None
408
+ if all_hand_patches:
409
+ agg_hand_tensor = torch.stack(all_hand_patches).to(self.device, dtype=torch.bfloat16)
410
+ padded_hand, actual_hand_num = self.pad_to_fixed_buckets(agg_hand_tensor)
411
+ with torch.no_grad():
412
+ raw_wilor = self.wilor_model({'img': padded_hand})
413
+ wilor_out_all = {}
414
+ for k, v in raw_wilor.items():
415
+ if isinstance(v, torch.Tensor):
416
+ wilor_out_all[k] = v[:actual_hand_num].clone()
417
+ elif isinstance(v, dict):
418
+ wilor_out_all[k] = {
419
+ nk: nv[:actual_hand_num].clone() if isinstance(nv, torch.Tensor) else nv
420
+ for nk, nv in v.items()
421
+ }
422
+ else:
423
+ wilor_out_all[k] = v
424
+
425
+ # =========================================================
426
+ # STAGE 3: Depth-based refinement
427
+ # =========================================================
428
+ if wilor_out_all and self.args.opt_depth and len(hand_meta_map) > 0:
429
+
430
+ hand_params = wilor_out_all['pred_mano_params']
431
+ all_hp = hand_params['hand_pose']
432
+ orig_shape_suffix = all_hp.shape[1:]
433
+
434
+ # The current number of hands
435
+ N = all_hp.shape[0]
436
+
437
+ # Mapping logic remains unchanged ...
438
+ raw_indices = [m['batch_idx'] for m in hand_meta_map]
439
+ unique_raw_indices = sorted(list(set(raw_indices)))
440
+ idx_map = {raw_idx: local_idx for local_idx, raw_idx in enumerate(unique_raw_indices)}
441
+ target_indices = torch.tensor([idx_map[idx] for idx in raw_indices], device=self.device)
442
+
443
+ # For safety, ensure index count does not exceed hp count
444
+ target_indices = target_indices[:N]
445
+
446
+ all_joints = wilor_out_all['pred_keypoints_2d'][target_indices]
447
+ target_depth_tensors = depths[target_indices]
448
+
449
+ refined_poses = self._batch_refine_fingers(
450
+ all_hp.reshape(N, -1),
451
+ torch.tensor([m['bbox'] for m in hand_meta_map], device=self.device)[:N],
452
+ target_depth_tensors,
453
+ all_joints
454
+ )
455
+
456
+ wilor_out_all['pred_mano_params']['hand_pose'] = refined_poses.reshape(N, *orig_shape_suffix)
457
+
458
+ # =========================================================
459
+ # STAGE 4: Result assembly
460
+ # =========================================================
461
+ # Build a fast lookup table: frame_idx -> list of hand results
462
+ frame_to_hands = [[] for _ in range(len(processed_imgs))]
463
+ if wilor_out_all:
464
+ for idx, meta in enumerate(hand_meta_map):
465
+ hp_aa = self._matrix_to_axis_angle(wilor_out_all['pred_mano_params']['hand_pose'][idx])
466
+
467
+ frame_to_hands[meta['batch_idx']].append({
468
+ 'is_right': meta['is_right'],
469
+ 'hp_aa': hp_aa
470
+ })
471
+
472
+ if batch_out_smpl is not None:
473
+ batch_out_smpl = {k: v.detach().to(torch.float32).cpu() if isinstance(v, torch.Tensor) else v
474
+ for k, v in batch_out_smpl.items()}
475
+
476
+ if wilor_out_all is not None:
477
+ wilor_out_all = {k: v.detach().to(torch.float32).cpu() if isinstance(v, torch.Tensor) else v
478
+ for k, v in wilor_out_all.items()}
479
+
480
+ for i in range(len(processed_imgs)):
481
+ img = processed_imgs[i]
482
+ if not body_results[i].boxes:
483
+ batch_results_data.append(None); continue
484
+
485
+ idx = original_to_agg_idx[i]
486
+
487
+ res_base = {
488
+ 'body_pose': batch_out_smpl['smplx_body_pose'][idx].numpy().flatten()[None],
489
+ 'root_pose': batch_out_smpl['smplx_root_pose'][idx].numpy().flatten()[None],
490
+ 'shape': batch_out_smpl['smplx_shape'][idx].numpy().flatten()[None],
491
+ 'expr': batch_out_smpl['smplx_expr'][idx].numpy().flatten()[None],
492
+ 'trans': (batch_out_smpl['smplx_trans'][idx] if 'smplx_trans' in batch_out_smpl
493
+ else batch_out_smpl['cam_trans'][idx]).numpy().flatten()[None],
494
+ 'jaw_pose': batch_out_smpl['smplx_jaw_pose'][idx].numpy().flatten()[None]
495
+ if 'smplx_jaw_pose' in batch_out_smpl else np.zeros((1, 3)),
496
+ }
497
+ pred_l = batch_out_smpl['smplx_lhand_pose'][idx].numpy().flatten()
498
+ pred_r = batch_out_smpl['smplx_rhand_pose'][idx].numpy().flatten()
499
+ lhand_p, rhand_p = None, None
500
+ for h_res in frame_to_hands[i]:
501
+ if h_res['is_right'] == 1:
502
+ rhand_p = h_res['hp_aa'] - self.smpl_mean_r
503
+ else:
504
+ lf = h_res['hp_aa'].reshape(-1, 3).copy(); lf[:, 1:3] *= -1
505
+ lhand_p = lf.flatten() - self.smpl_mean_l
506
+
507
+ res_base['lhand_pose'] = (lhand_p if lhand_p is not None else pred_l)[None]
508
+ res_base['rhand_pose'] = (rhand_p if rhand_p is not None else pred_r)[None]
509
+ batch_results_data.append(res_base)
510
+
511
+ return batch_results_data
512
+
513
+ def process_single_item(self, input_path, output_root_dir):
514
+ item_name = os.path.splitext(os.path.basename(input_path.rstrip(os.sep)))[0]
515
+ final_pkl_path = os.path.join(output_root_dir, f"{item_name}.pkl")
516
+
517
+ def update_pbar():
518
+ if self.args.progress == 'overall' and self.global_pbar:
519
+ elapsed_min = (time.time() - self.start_time) / 60
520
+ self.global_pbar.set_description(f"R{self.accelerator.process_index} | Done:{self.files_done}/{self.my_total} | {elapsed_min:.1f}m")
521
+ self.global_pbar.update(1)
522
+
523
+ if os.path.exists(final_pkl_path):
524
+ self.files_done += 1; update_pbar(); return
525
+
526
+ device_id = self.device.index if self.device.index is not None else 0
527
+ try:
528
+ prefetcher = FramePrefetcher(input_path, device_id=device_id).start()
529
+ except RuntimeError as e:
530
+ print(f"Skip broken vidoe: {input_path}")
531
+ with open("corrupted_videos.log", "a") as f:
532
+ f.write(f"{input_path}\n")
533
+ return
534
+ total_frames = prefetcher.total_frames
535
+ pbar = tqdm(total=total_frames, desc=f"R{self.accelerator.process_index} | {item_name}",
536
+ disable=(not self.accelerator.is_main_process or self.args.progress == 'overall'))
537
+
538
+ all_frames_data = []
539
+ while prefetcher.is_running():
540
+ batch = prefetcher.get_batch(self.args.batch_size)
541
+ if not batch: break
542
+ batch_res = self._process_batch(batch)
543
+ for res in batch_res:
544
+ if res:
545
+ all_frames_data.append(np.concatenate([
546
+ res['root_pose'].flatten(), res['body_pose'].flatten(),
547
+ res['lhand_pose'].flatten(), res['rhand_pose'].flatten(),
548
+ res['jaw_pose'].flatten(), res['shape'].flatten(), res['expr'].flatten()
549
+ ]))
550
+ if pbar: pbar.update(1)
551
+ # torch.cuda.empty_cache()
552
+
553
+ if all_frames_data:
554
+ with open(final_pkl_path, 'wb') as f:
555
+ pickle.dump(np.stack(all_frames_data, axis=0).astype(np.float32), f)
556
+
557
+ prefetcher.stop(); pbar.close()
558
+ self.files_done += 1; update_pbar()
559
+
560
+ def main(args):
561
+ accelerator = Accelerator(); set_seed(42)
562
+ input_items = sorted([os.path.join(args.input_path, f) for f in os.listdir(args.input_path) if f.endswith(('.mp4', '.mov', '.avi'))])
563
+ to_process = [it for it in input_items if not os.path.exists(os.path.join(args.output_path, f"{os.path.splitext(os.path.basename(it))[0]}.pkl"))]
564
+ if accelerator.is_main_process:
565
+ if not os.path.exists(args.output_path):
566
+ os.makedirs(args.output_path, exist_ok=True)
567
+ print(f"Created output directory: {args.output_path}")
568
+
569
+ print("\n" + "="*50)
570
+ print("Distributed Accelerator:")
571
+ print(f" - num_processes (GPU): {accelerator.num_processes}")
572
+ print(f" - mixed_precision: {accelerator.mixed_precision} (Native BF16)")
573
+ print(f" - TF32 Precision: Enabled (Hopper Optimized)")
574
+ print(f" - Apply SR: {args.apply_sr}")
575
+ print("="*50 + "\n")
576
+ print("="*50 + "\n")
577
+ print(f"Dataset Status: Total={len(input_items)} | To Process={len(to_process)} | Done={len(input_items)-len(to_process)}")
578
+ print(f"Batch Settings: Size={args.batch_size} | Mode={args.progress}")
579
+ print(f"Num workers: Size={args.num_workers}")
580
+
581
+ print("-" * 50 + "\n")
582
+
583
+ with accelerator.split_between_processes(to_process) as my_items:
584
+ extractor = SMPLXPoseExtractor(args, accelerator)
585
+ extractor.my_total = len(my_items); extractor.start_time = time.time()
586
+ if args.progress == 'overall' and accelerator.is_main_process:
587
+ extractor.global_pbar = tqdm(total=len(my_items), position=0)
588
+ for item in my_items:
589
+ extractor.process_single_item(item, args.output_path)
590
+ if extractor.global_pbar: extractor.global_pbar.close()
591
+
592
+ accelerator.wait_for_everyone()
593
+ if accelerator.is_main_process:
594
+ print("\n" + "="*50)
595
+ print("All Processing Done! Calculating final statistics (Incremental Mode)...")
596
+
597
+ pkl_files = [f for f in os.listdir(args.output_path)
598
+ if f.endswith('.pkl') and os.path.isfile(os.path.join(args.output_path, f))
599
+ and not any(x in f.lower() for x in ['wilor', 'smplest', 'final'])]
600
+
601
+ count = 0
602
+ sum_x = np.zeros(179, dtype=np.float64)
603
+ sum_x2 = np.zeros(179, dtype=np.float64)
604
+
605
+ for p in tqdm(pkl_files, desc="Processing Statistics"):
606
+ pkl_path = os.path.join(args.output_path, p)
607
+ try:
608
+ with open(pkl_path, 'rb') as f:
609
+ data = pickle.load(f)
610
+
611
+ if isinstance(data, np.ndarray) and data.shape[-1] == 179:
612
+ sum_x += np.sum(data, axis=0)
613
+ sum_x2 += np.sum(data**2, axis=0)
614
+ count += data.shape[0]
615
+
616
+ if count == 0:
617
+ print(f"\nDebug: File={p}, Type={type(data)}, Shape={getattr(data, 'shape', 'No Shape')}")
618
+
619
+ except Exception as e:
620
+ print(f"Warning: Could not load {pkl_path}, error: {e}")
621
+
622
+ if count > 0:
623
+ mean = sum_x / count
624
+ var = (sum_x2 / count) - (mean ** 2)
625
+ std = np.sqrt(np.maximum(var, 1e-8))
626
+
627
+ stats_dir = os.path.join(args.output_path, 'stats')
628
+ os.makedirs(stats_dir, exist_ok=True)
629
+
630
+ dataset_name = args.dataset if hasattr(args, 'dataset') else "csl_news"
631
+ torch.save(torch.from_numpy(mean.astype(np.float32)), os.path.join(stats_dir, f"{dataset_name.lower()}_mean.pt"))
632
+ torch.save(torch.from_numpy(std.astype(np.float32)), os.path.join(stats_dir, f"{dataset_name.lower()}_std.pt"))
633
+
634
+ print(f"Success: Processed {count} frames from {len(pkl_files)} files.")
635
+ print(f"Statistics saved in: {stats_dir}")
636
+ else:
637
+ print("Error: No valid pose data found for statistics.")
638
+
639
+ if torch.distributed.is_initialized():
640
+ torch.distributed.destroy_process_group()
641
+
642
+ if __name__ == "__main__":
643
+ parser = argparse.ArgumentParser()
644
+ parser.add_argument('--input_path', type=str, required=True)
645
+ parser.add_argument('--output_path', type=str, required=True)
646
+ parser.add_argument('--progress', type=str, choices=['each', 'overall'], default='overall')
647
+ parser.add_argument('--num_workers', type=int)
648
+ parser.add_argument('--batch_size', type=int)
649
+
650
+ parser.add_argument('--apply_sr', action='store_true', default=False)
651
+ parser.add_argument('--opt_depth', action='store_true', default=False)
652
+ parser.add_argument('--yolo_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/yolo/yolo26l.pt')
653
+ parser.add_argument('--hand_detector_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/wilor/detector.pt')
654
+ parser.add_argument('--wilor_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/wilor/wilor_final.ckpt')
655
+ parser.add_argument('--real_esrgan_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/realesrgan/RealESRGAN_x4plus.pth')
656
+ parser.add_argument('--depth_anything_v2_ckpt', type=str, default=r'/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction/pretrained_weight/depth_anything-v2/depth_anything_v2_vitl.pth')
657
+ main(parser.parse_args())
extract_smplx_pose.sh ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ export TZ=Asia/Shanghai
4
+
5
+ LOG_TIME=$(date +%Y%m%d_%H%M%S)
6
+ CUR_DIR="/mnt/shared-storage-user/mllm/zangyuhang/pmx/SMPL-X_pose_extraction"
7
+ cd $CUR_DIR
8
+
9
+ nohup stdbuf -oL bash -c "while true; do date; nvidia-smi; sleep 10; done" \
10
+ > "${CUR_DIR}/extract_smplx_${LOG_TIME}_gpu_monitor.log" 2>&1 &
11
+
12
+ PY310_BIN="/mnt/shared-storage-user/mllm/zangyuhang/pmx/envs/py310/bin/python3.10"
13
+ export PYTHONPATH="${CUR_DIR}:${CUR_DIR}/SMPLest-X:${CUR_DIR}/WiLoR:${CUR_DIR}/Depth-Anything-V2:$PYTHONPATH"
14
+ export OPENCV_FOR_THREADS_NUM=8
15
+
16
+ $PY310_BIN -u -m accelerate.commands.launch \
17
+ --same_network \
18
+ --num_processes 4 \
19
+ --num_machines 1 \
20
+ --mixed_precision bf16 \
21
+ --dynamo_backend no \
22
+ --num_cpu_threads_per_process 24 \
23
+ extract_smplx_pose.py \
24
+ --batch_size 147456 \
25
+ --num_workers 64 \
26
+ --input_path "/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLUDatasets/CSL_Daily/rgb_format/videos_512x512_30fps" \
27
+ --output_path "/mnt/shared-storage-user/mllm/zangyuhang/pmx/SLGDatasets/CSL_News/new_merged_poses"
log/extract_smplx_20260211_195012.log ADDED
The diff for this file is too large to render. See raw diff
 
log/extract_smplx_20260212_034356.log ADDED
The diff for this file is too large to render. See raw diff
 
pretrained_weight/.DS_Store ADDED
Binary file (6.15 kB). View file