Spaces:

dqj5182
/

HACO

Build error

App Files Files Community

dqj5182 commited on May 26, 2025

Commit

5732928

1 Parent(s): 6db77af

init

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

LICENSE +399 -0
README.md +201 -13
data/MOW/__pycache__/dataset.cpython-38.pyc +0 -0
data/MOW/dataset.py +156 -0
data/dataset.py +40 -0
demo.py +122 -0
demo_video.py +132 -0
lib/core/__pycache__/config.cpython-38.pyc +0 -0
lib/core/__pycache__/logger.cpython-38.pyc +0 -0
lib/core/config.py +93 -0
lib/core/logger.py +55 -0
lib/models/__pycache__/model.cpython-38.pyc +0 -0
lib/models/backbone/__pycache__/backbone_hamer_style.cpython-38.pyc +0 -0
lib/models/backbone/__pycache__/resnet.cpython-38.pyc +0 -0
lib/models/backbone/__pycache__/vit.cpython-38.pyc +0 -0
lib/models/backbone/backbone_hamer_style.py +273 -0
lib/models/backbone/fpn.py +282 -0
lib/models/backbone/hrnet.py +518 -0
lib/models/backbone/resnet.py +95 -0
lib/models/backbone/vit.py +33 -0
lib/models/decoder/__pycache__/decoder_hamer_style.cpython-38.pyc +0 -0
lib/models/decoder/decoder_hamer_style.py +637 -0
lib/models/model.py +100 -0
lib/utils/__pycache__/contact_utils.cpython-38.pyc +0 -0
lib/utils/__pycache__/eval_utils.cpython-38.pyc +0 -0
lib/utils/__pycache__/func_utils.cpython-38.pyc +0 -0
lib/utils/__pycache__/human_models.cpython-38.pyc +0 -0
lib/utils/__pycache__/log_utils.cpython-38.pyc +0 -0
lib/utils/__pycache__/mano_utils.cpython-38.pyc +0 -0
lib/utils/__pycache__/mesh_utils.cpython-38.pyc +0 -0
lib/utils/__pycache__/preprocessing.cpython-38.pyc +0 -0
lib/utils/__pycache__/train_utils.cpython-38.pyc +0 -0
lib/utils/__pycache__/transforms.cpython-38.pyc +0 -0
lib/utils/__pycache__/vis_utils.cpython-38.pyc +0 -0
lib/utils/contact_utils.py +55 -0
lib/utils/demo_utils.py +105 -0
lib/utils/eval_utils.py +50 -0
lib/utils/func_utils.py +65 -0
lib/utils/human_models.py +49 -0
lib/utils/log_utils.py +12 -0
lib/utils/mano_utils.py +136 -0
lib/utils/mesh_utils.py +74 -0
lib/utils/preprocessing.py +330 -0
lib/utils/smplx/LICENSE +58 -0
lib/utils/smplx/README.md +186 -0
lib/utils/smplx/examples/demo.py +180 -0
lib/utils/smplx/examples/demo_layers.py +181 -0
lib/utils/smplx/examples/vis_flame_vertices.py +92 -0
lib/utils/smplx/examples/vis_mano_vertices.py +99 -0
lib/utils/smplx/setup.py +79 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,399 @@

+Attribution-NonCommercial 4.0 International
+=======================================================================
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+Using Creative Commons Public Licenses
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+	wiki.creativecommons.org/Considerations_for_licensors
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More_considerations
+     for the public:
+	wiki.creativecommons.org/Considerations_for_licensees
+=======================================================================
+Creative Commons Attribution-NonCommercial 4.0 International Public
+License
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial 4.0 International Public License ("Public
+License"). To the extent this Public License may be interpreted as a
+contract, You are granted the Licensed Rights in consideration of Your
+acceptance of these terms and conditions, and the Licensor grants You
+such rights in consideration of benefits the Licensor receives from
+making the Licensed Material available under these terms and
+conditions.
+Section 1 -- Definitions.
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+  b. Adapter's License means the license You apply to Your Copyright
+     and Similar Rights in Your contributions to Adapted Material in
+     accordance with the terms and conditions of this Public License.
+  c. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+  d. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+  e. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+  f. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+  g. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+  h. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+  i. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+  j. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+  k. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+  l. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+Section 2 -- Scope.
+  a. License grant.
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+            b. produce, reproduce, and Share Adapted Material for
+               NonCommercial purposes only.
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+       5. Downstream recipients.
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+            b. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+  b. Other rights.
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+Section 3 -- License Conditions.
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+  a. Attribution.
+       1. If You Share the Licensed Material (including in modified
+          form), You must:
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+                ii. a copyright notice;
+               iii. a notice that refers to this Public License;
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+       4. If You Share Adapted Material You produce, the Adapter's
+          License You apply must not prevent recipients of the Adapted
+          Material from complying with this Public License.
+Section 4 -- Sui Generis Database Rights.
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only;
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material; and
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+Section 6 -- Term and Termination.
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+       2. upon express reinstatement by the Licensor.
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+Section 7 -- Other Terms and Conditions.
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+Section 8 -- Interpretation.
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+=======================================================================
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the “Licensor.” The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+Creative Commons may be contacted at creativecommons.org.

README.md CHANGED Viewed

@@ -1,13 +1,201 @@
----
-title: HACO
-emoji: 🦀
-colorFrom: yellow
-colorTo: yellow
-sdk: gradio
-sdk_version: 5.29.1
-app_file: app.py
-pinned: false
-license: cc-by-nc-sa-4.0
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align="center">
+# HACO: Learning Dense Hand Contact Estimation <br> from Imbalanced Data
+<b>[Daniel Sungho Jung](https://dqj5182.github.io/)</b>, <b>[Kyoung Mu Lee](https://cv.snu.ac.kr/index.php/~kmlee/)</b>
+<p align="center">
+    <img src="asset/logo_cvlab.png" height=55>
+</p>
+<b>Seoul National University</b>
+<a>![Python 3.8+](https://img.shields.io/badge/Python-3.8%2B-brightgreen.svg)</a>
+<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a>
+[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
+<a href='https://haco-release.github.io/'><img src='https://img.shields.io/badge/Project_Page-HACO-green' alt='Project Page'></a>
+<a href="https://arxiv.org/pdf/2505.11152"><img src='https://img.shields.io/badge/Paper-HACO-blue' alt='Paper PDF'></a>
+<a href="https://arxiv.org/abs/2505.11152"><img src='https://img.shields.io/badge/arXiv-HACO-red' alt='Paper PDF'></a>
+<h2>ArXiv 2025</h2>
+<img src="./asset/teaser.png" alt="Logo" width="75%">
+</div>
+_**HACO** is a framework for **dense hand contact estimation** that addresses **class and spatial imbalance issues** in training on large-scale datasets. Based on **14 datasets** that span **hand-object**, **hand-hand**, **hand-scene**, and **hand-body interaction**, we build a powerful model that learns dense hand contact in diverse scenarios._
+## Installation
+* We recommend you to use an [Anaconda](https://www.anaconda.com/) virtual environment. Install PyTorch >=1.11.0 and Python >= 3.8.0. Our latest HACO model is tested on Python 3.8.20, PyTorch 1.11.0, CUDA 11.3.
+* Setup the environment.
+```
+# Initialize conda environment
+conda create -n haco python=3.8 -y
+conda activate haco
+# Install PyTorch
+conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
+# Install all remaining packages
+pip install -r requirements.txt
+```
+* Download our checkpoints from [OneDrive](https://1drv.ms/u/c/bf7e2a9a100f1dba/Ef18aU5ItbFDgW1sSv3P0l0BGTzN6PlsCnm0q5ecpTWIfQ?e=Y40qsN).
+## Quick demo (Image)
+To run HACO on demo images using the [Mediapipe](https://ai.google.dev/edge/mediapipe/solutions/guide) hand detector, please run:
+```
+python demo.py --backbone {BACKBONE_TYPE} --checkpoint {CKPT_PATH} --input_path {INPUT_PATH}
+```
+For example,
+```
+# ViT-H (Default, HaMeR initialized) backbone
+python demo.py --backbone hamer --checkpoint release_checkpoint/haco_final_hamer_checkpoint.ckpt --input_path asset/example_images
+# ViT-B (ImageNet initialized) backbone
+python demo.py --backbone vit-b-16 --checkpoint release_checkpoint/haco_final_vit_b_checkpoint.ckpt --input_path asset/example_images
+```
+> Note: The demo includes post-processing to reduce noise in small or sparse contact areas.
+## Quick demo (Video)
+Before the demo, please download example videos from [OneDrive](https://1drv.ms/u/c/bf7e2a9a100f1dba/ERsk_D-EubxBi1Usu2bW2hABwy9nxzRxAHutXDxmv85TLw?e=rIjOI7) and save at `asset/example_videos`.<br>
+To run HACO on demo videos using the [Mediapipe](https://ai.google.dev/edge/mediapipe/solutions/guide) hand detector, please run:
+```
+python demo_video.py --backbone {BACKBONE_TYPE} --checkpoint {CKPT_PATH} --input_path {INPUT_PATH}
+```
+For example,
+```
+# ViT-H (Default, HaMeR initialized) backbone
+python demo_video.py --backbone hamer --checkpoint release_checkpoint/haco_final_hamer_checkpoint.ckpt --input_path asset/example_videos
+# ViT-B (ImageNet initialized) backbone
+python demo_video.py --backbone vit-b-16 --checkpoint release_checkpoint/haco_final_vit_b_checkpoint.ckpt --input_path asset/example_videos
+```
+> Note: The demo includes post-processing for both spatial smoothing of small contact areas and temporal smoothing across frames to ensure stable contact predictions and hand detections.
+## Data
+You need to follow directory structure of the `data` and `release_checkpoint` as below.
+```
+${ROOT}
+|-- data
+|   |-- base_data
+|   |   |-- demo_data
+|   |   |   |-- hand_landmarker.task
+|   |   |-- human_models
+|   |   |   |-- mano
+|   |   |   |   |-- MANO_LEFT.pkl
+|   |   |   |   |-- MANO_RIGHT.pkl
+|   |   |   |   |-- V_regressor_84.npy
+|   |   |   |   |-- V_regressor_336.npy
+|   |   |-- pretrained_models
+|   |   |   |-- hamer
+|   |   |   |-- handoccnet
+|   |   |   |-- hrnet
+|   |   |   |-- pose2pose
+|   |-- MOW
+|   |   |-- data
+|   |   |   |-- images
+|   |   |   |-- masks
+|   |   |   |-- models
+|   |   |   |-- poses.json
+|   |   |   |-- watertight_models
+|   |   |-- preprocessed_data
+|   |   |   |-- test
+|   |   |   |   |-- contact_data
+|   |   |-- splits
+|   |   |-- dataset.py
+|-- release_checkpoint
+```
+* Download base_data from [OneDrive](https://1drv.ms/u/c/bf7e2a9a100f1dba/EUmlgxCPqwpEvIhma80VZsoBnHrIPXzbsmJzoQpP-saj-A?e=fSxPEi).
+* Download [MOW](https://zhec.github.io/rhoi/) data from GitHub ([images](https://github.com/ZheC/MOW), [models](https://github.com/ZheC/MOW), [poses.json](https://github.com/ZheC/MOW)) and OneDrive ([masks](https://1drv.ms/u/c/bf7e2a9a100f1dba/Ef2YhwccS4tPt1WrAAP4-iMBjcaSUgawDMnf_HDpqoTeNw?e=eQYJ4e), [watertight_models](https://1drv.ms/u/c/bf7e2a9a100f1dba/EW5YXeXtk3NBnX9PcvJtGIABj_9c1FW2RdrcppDgRzqHhg?e=ryUqCf), [preprocessed_data](https://1drv.ms/u/c/bf7e2a9a100f1dba/ESkqLhHk9gFHo4HH2uA9akABgYuS2wLgWfr4YJMRmagezQ?e=DoGFso), [splits](https://1drv.ms/u/c/bf7e2a9a100f1dba/EW60jCPiuNNOjkmCUdqlBbEBact_Ums22dwBoQoFMkUV6w?e=2lxpJd)). For GitHub data, you can directly download them by running:
+```
+bash scripts/download_official_mow.sh
+```
+* Download initial checkpoints by running:
+```
+bash scripts/download_initial_checkpoints.sh
+```
+## Running HACO
+### Train
+TBA by the June, 2025.
+### Test
+To evaluate HACO on [MOW](https://github.com/ZheC/MOW) dataset, please run:
+```
+python test.py --backbone {BACKBONE_TYPE} --checkpoint {CKPT_PATH}
+```
+For example,
+```
+# ViT-H (Default, HaMeR initialized) backbone
+python test.py --backbone hamer --checkpoint release_checkpoint/haco_final_hamer_checkpoint.ckpt
+# ViT-L (ImageNet initialized) backbone
+python test.py --backbone vit-l-16 --checkpoint release_checkpoint/haco_final_vit_l_checkpoint.ckpt
+# ViT-B (ImageNet initialized) backbone
+python test.py --backbone vit-b-16 --checkpoint release_checkpoint/haco_final_vit_b_checkpoint.ckpt
+# ViT-S (ImageNet initialized) backbone
+python test.py --backbone vit-s-16 --checkpoint release_checkpoint/haco_final_vit_s_checkpoint.ckpt
+# FPN (HandOccNet initialized) backbone
+python test.py --backbone handoccnet --checkpoint release_checkpoint/haco_final_handoccnet_checkpoint.ckpt
+# HRNet-W48 (ImageNet initialized) backbone
+python test.py --backbone hrnet-w48 --checkpoint release_checkpoint/haco_final_hrnet_w48_checkpoint.ckpt
+# HRNet-W32 (ImageNet initialized) backbone
+python test.py --backbone hrnet-w32 --checkpoint release_checkpoint/haco_final_hrnet_w32_checkpoint.ckpt
+# ResNet-152 (ImageNet initialized) backbone
+python test.py --backbone resnet-152 --checkpoint release_checkpoint/haco_final_resnet_152_checkpoint.ckpt
+# ResNet-101 (ImageNet initialized) backbone
+python test.py --backbone resnet-101 --checkpoint release_checkpoint/haco_final_resnet_101_checkpoint.ckpt
+# ResNet-50 (ImageNet initialized) backbone
+python test.py --backbone resnet-50 --checkpoint release_checkpoint/haco_final_resnet_50_checkpoint.ckpt
+# ResNet-34 (ImageNet initialized) backbone
+python test.py --backbone resnet-34 --checkpoint release_checkpoint/haco_final_resnet_34_checkpoint.ckpt
+# ResNet-18 (ImageNet initialized) backbone
+python test.py --backbone resnet-18 --checkpoint release_checkpoint/haco_final_resnet_18_checkpoint.ckpt
+```
+## Technical Q&A
+* ImportError: cannot import name 'bool' from 'numpy': Please just comment out the line "from numpy import bool, int, float, complex, object, unicode, str, nan, inf".
+* `np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information: Please refer to [here](https://github.com/scikit-optimize/scikit-optimize/issues/1171)
+## Acknowledgement
+We thank:
+* [DECO](https://openaccess.thecvf.com/content/ICCV2023/papers/Tripathi_DECO_Dense_Estimation_of_3D_Human-Scene_Contact_In_The_Wild_ICCV_2023_paper.pdf) for human-scene contact estimation.
+* [CB Loss](https://openaccess.thecvf.com/content_CVPR_2019/papers/Cui_Class-Balanced_Loss_Based_on_Effective_Number_of_Samples_CVPR_2019_paper.pdf) for inspiration on VCB Loss.
+* [HaMeR](https://openaccess.thecvf.com/content/CVPR2024/papers/Pavlakos_Reconstructing_Hands_in_3D_with_Transformers_CVPR_2024_paper.pdf) for Transformer-based regression architecture.
+## Reference
+```
+@article{jung2025haco,
+title = {Learning Dense Hand Contact Estimation from Imbalanced Data},
+author = {Jung, Daniel Sungho and Lee, Kyoung Mu},
+journal = {arXiv preprint arXiv:2505.11152},
+year = {2025}
+}
+```

data/MOW/__pycache__/dataset.cpython-38.pyc ADDED Viewed

Binary file (6.14 kB). View file

data/MOW/dataset.py ADDED Viewed

	@@ -0,0 +1,156 @@

+import os
+import cv2
+import json
+import trimesh
+import numpy as np
+import point_cloud_utils as pcu
+import torch
+from torch.utils.data import Dataset
+from torchvision.transforms import Normalize
+from lib.core.config import cfg
+from lib.utils.preprocessing import augmentation_contact, process_human_model_output_orig, mask2bbox
+from lib.utils.func_utils import load_img
+from lib.utils.mesh_utils import center_vertices, load_obj_nr
+from lib.utils.contact_utils import get_ho_contact_and_offset
+from lib.utils.human_models import mano
+class MOW(Dataset):
+    def __init__(self, transform, data_split):
+        super(MOW, self).__init__()
+        self.__dict__.update(locals())
+        self.transfrom = transform
+        dataset_name = 'mow'
+        self.data_split = data_split
+        self.root_path = root_path = 'data/MOW'
+        self.data_dir = os.path.join(self.root_path, 'data')
+        self.split_dir = os.path.join(self.root_path, 'splits') # This inherits IHOI
+        self.watertight_obj_model_dir = os.path.join(self.data_dir, 'watertight_models')
+        os.makedirs(self.watertight_obj_model_dir, exist_ok=True)
+        with open(os.path.join(self.data_dir, 'poses.json'), 'r') as f:
+            annos = json.load(f)
+        self.db = {}
+        for anno in annos:
+            self.db[anno['image_id']] = anno
+        del annos
+        self.split = {'train': np.load('data/MOW/splits/mow_train.npy').tolist(), 'test': np.load('data/MOW/splits/mow_test.npy').tolist()}
+        self.length = len(self.split[data_split])
+        self.use_preprocessed_data = True
+        self.use_preprocessed_watertight_mesh = True
+        self.contact_data_path = os.path.join(root_path, 'preprocessed_data', data_split, 'contact_data')
+        os.makedirs(self.contact_data_path, exist_ok=True)
+    def __len__(self):
+        return self.length
+    def __getitem__(self, index):
+        sample_id = self.split[self.data_split][index]
+        ann = self.db[sample_id]
+        image_id = ann['image_id']
+        img_path = os.path.join(self.data_dir, 'images', f'{image_id}.jpg')
+        orig_img = load_img(img_path)
+        mask_ho_path = os.path.join(self.data_dir, 'masks/both', f'{image_id}.jpg')
+        mask_ho = (cv2.imread(mask_ho_path) > 128)[:, :, 0]
+        bbox_ho = mask2bbox(mask_ho, expansion_factor=cfg.DATASET.ho_bbox_expand_ratio)
+        ############################### PROCESS CROP AND AUGMENTATION ################################
+        # Crop image
+        img, img2bb_trans, bb2img_trans, rot, do_flip, color_scale = augmentation_contact(orig_img.copy(), bbox_ho, self.data_split, enforce_flip=False)
+        crop_img = img.copy()
+        # Transform for 3D HMR
+        if ('resnet' in cfg.MODEL.backbone_type or 'hrnet' in cfg.MODEL.backbone_type or 'handoccnet' in cfg.MODEL.backbone_type):
+            img = self.transform(img.astype(np.float32)/255.0)
+        elif (cfg.MODEL.backbone_type in ['hamer']) or ('vit' in cfg.MODEL.backbone_type):
+            normalize_img = Normalize(mean=cfg.MODEL.img_mean, std=cfg.MODEL.img_std)
+            img = img.transpose(2, 0, 1) / 255.0
+            img = normalize_img(torch.from_numpy(img)).float()
+        else:
+            raise NotImplementedError
+        ############################### PROCESS CROP AND AUGMENTATION ################################
+        mano_valid = np.ones((1), dtype=np.float32)
+        if not self.use_preprocessed_data:
+            hand_t = ann['hand_t']
+            hand_pose = ann['hand_pose']
+            hand_R = ann['hand_R']
+            hand_s = ann['hand_s']
+            hand_trans = ann['trans']
+            obj_instance = ann['obj_url'].split('/')[-1].split('.obj')[0]
+            obj_rest_mesh_path = os.path.join(self.data_dir, 'models', f'{obj_instance}.obj')
+            obj_R = np.array(ann['R']).reshape(3, 3)
+            obj_t = np.array(ann['t']).reshape((1, 3))
+            obj_s = np.array(ann['s'], dtype=np.float32)
+            obj_name = ann['obj_name']
+            mano_param = {'pose': np.array(hand_pose), 'shape': np.zeros(1), 'trans': np.array(hand_trans), 'hand_type': 'right'}
+            mano_mesh_cam, mano_joint_cam, mano_pose, mano_shape, mano_trans = process_human_model_output_orig(mano_param, {}) # mano_mesh_cam is exactly same with output.vertices in official MOW
+            mano_mesh_cam = (mano_mesh_cam @ np.array(hand_R).reshape(3, 3))
+            mano_mesh_cam += np.array(hand_t)[:, None].transpose(1, 0)
+            mano_mesh_cam *= np.array(hand_s) # mano_mesh_cam is exactly same with hand.vertices in official MOW
+            hand_mesh = trimesh.Trimesh(mano_mesh_cam, mano.watertight_face['right'])
+            obj_rest_verts, obj_rest_faces = load_obj_nr(obj_rest_mesh_path)
+            obj_rest_verts, obj_rest_faces = obj_rest_verts.detach().cpu().numpy(), obj_rest_faces.detach().cpu().numpy()
+            obj_rest_mesh = trimesh.Trimesh(obj_rest_verts, obj_rest_faces)
+            # Make object mesh watertight
+            watertight_obj_model_path = os.path.join(self.watertight_obj_model_dir, f'{obj_instance}.obj')
+            if self.use_preprocessed_watertight_mesh and os.path.exists(watertight_obj_model_path):
+                mesh_obj_watertight = trimesh.load(watertight_obj_model_path)
+                # post-process
+                trimesh.repair.fix_normals(mesh_obj_watertight)
+                trimesh.repair.fix_inversion(mesh_obj_watertight)
+                trimesh.repair.fill_holes(mesh_obj_watertight)
+                obj_rest_mesh = mesh_obj_watertight
+            else:
+                print('Building new watertight mesh!!!!')
+                resolution = 50_000
+                obj_rest_mesh.vertices, obj_rest_mesh.faces = pcu.make_mesh_watertight(obj_rest_mesh.vertices, obj_rest_mesh.faces, resolution)
+                if not os.path.exists(watertight_obj_model_path):
+                    _ = obj_rest_mesh.export(watertight_obj_model_path)
+            obj_rest_verts, obj_rest_faces = center_vertices(obj_rest_mesh.vertices, obj_rest_mesh.faces)
+            obj_verts = np.dot(obj_rest_verts, obj_R)
+            obj_verts += obj_t
+            obj_verts *= obj_s
+            obj_mesh = trimesh.Trimesh(obj_verts, obj_rest_faces)
+            # Contact data
+            contact_h, obj_coord_c, contact_valid, inter_coord_valid = get_ho_contact_and_offset(hand_mesh, obj_mesh, cfg.MODEL.c_thres_in_the_wild)
+            contact_h = contact_h.astype(np.float32)
+            contact_data = dict(contact_h=contact_h)
+            if True:
+                np.save(os.path.join(self.contact_data_path, f'{sample_id}.npy'), contact_h)
+        else:
+            contact_h = np.load(os.path.join(self.contact_data_path, f'{sample_id}.npy')).astype(np.float32)
+            contact_data = dict(contact_h=contact_h)
+        input_data = dict(image=img)
+        targets_data = dict(contact_data=contact_data)
+        meta_info = dict(sample_id=sample_id, orig_img=orig_img, mano_valid=mano_valid)
+        return dict(input_data=input_data, targets_data=targets_data, meta_info=meta_info)

data/dataset.py ADDED Viewed

	@@ -0,0 +1,40 @@

+import random
+import numpy as np
+from torch.utils.data.dataset import Dataset
+class MultipleDatasets(Dataset):
+    def __init__(self, dbs, make_same_len=True):
+        self.dbs = dbs
+        self.db_num = len(self.dbs)
+        self.max_db_data_num = max([len(db) for db in dbs])
+        self.db_len_cumsum = np.cumsum([len(db) for db in dbs])
+        self.make_same_len = make_same_len
+    def __len__(self):
+        # all dbs have the same length
+        if self.make_same_len:
+            return self.max_db_data_num * self.db_num
+        # each db has different length
+        else:
+            return sum([len(db) for db in self.dbs])
+    def __getitem__(self, index):
+        if self.make_same_len:
+            db_idx = index // self.max_db_data_num
+            data_idx = index % self.max_db_data_num
+            if data_idx >= len(self.dbs[db_idx]) * (self.max_db_data_num // len(self.dbs[db_idx])): # last batch: random sampling
+                data_idx = random.randint(0,len(self.dbs[db_idx])-1)
+            else: # before last batch: use modular
+                data_idx = data_idx % len(self.dbs[db_idx])
+        else:
+            for i in range(self.db_num):
+                if index < self.db_len_cumsum[i]:
+                    db_idx = i
+                    break
+            if db_idx == 0:
+                data_idx = index
+            else:
+                data_idx = index - self.db_len_cumsum[db_idx-1]
+        return self.dbs[db_idx][data_idx]

demo.py ADDED Viewed

	@@ -0,0 +1,122 @@

+import os
+import cv2
+import torch
+import argparse
+import numpy as np
+from tqdm import tqdm
+import mediapipe as mp
+from mediapipe.tasks.python import vision
+from mediapipe.tasks.python import BaseOptions
+from lib.core.config import cfg, update_config
+from lib.models.model import HACO
+from lib.utils.human_models import mano
+from lib.utils.contact_utils import get_contact_thres
+from lib.utils.vis_utils import ContactRenderer, draw_landmarks_on_image
+from lib.utils.preprocessing import augmentation_contact
+from lib.utils.demo_utils import remove_small_contact_components
+parser = argparse.ArgumentParser(description='Demo HACO')
+parser.add_argument('--backbone', type=str, default='hamer', choices=['hamer', 'vit-l-16', 'vit-b-16', 'vit-s-16', 'handoccnet', 'hrnet-w48', 'hrnet-w32', 'resnet-152', 'resnet-101', 'resnet-50', 'resnet-34', 'resnet-18'], help='backbone model')
+parser.add_argument('--checkpoint', type=str, default='', help='model path for demo')
+parser.add_argument('--input_path', type=str, default='asset/example_images', help='image path for demo')
+args = parser.parse_args()
+# Set device as CUDA
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+# Initialize directories
+experiment_dir = 'experiments_demo_image'
+# Load config
+update_config(backbone_type=args.backbone, exp_dir=experiment_dir)
+# Initialize renderer
+contact_renderer = ContactRenderer()
+# Load demo images
+input_dir = args.input_path
+images = [f for f in os.listdir(input_dir) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
+# Initialize MediaPipe HandLandmarker
+base_options = BaseOptions(model_asset_path=cfg.MODEL.hand_landmarker_path)
+hand_options = vision.HandLandmarkerOptions(base_options=base_options, num_hands=2)
+detector = vision.HandLandmarker.create_from_options(hand_options)
+############# Model #############
+model = HACO().to(device)
+model.eval()
+############# Model #############
+# Load model checkpoint if provided
+if args.checkpoint:
+    checkpoint = torch.load(args.checkpoint, map_location=device)
+    model.load_state_dict(checkpoint['state_dict'])
+############################### Demo Loop ###############################
+for i, frame_name in tqdm(enumerate(images), total=len(images)):
+    print(f"Processing: {frame_name}")
+    # Load and convert image
+    frame_path = os.path.join(input_dir, frame_name)
+    frame = cv2.imread(frame_path)
+    orig_img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+    frame_name_base = os.path.splitext(frame_name)[0]
+    # Hand landmark detection
+    mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=orig_img.copy())
+    detection_result = detector.detect(mp_image)
+    annotated_image, right_hand_bbox = draw_landmarks_on_image(orig_img.copy(), detection_result)
+    if right_hand_bbox is None:
+        print(f"Skipping {frame_name} - no hand detected.")
+        continue
+    print(f"Frame {i}: Right hand bbox: {right_hand_bbox}")
+    # Image preprocessing
+    crop_img, img2bb_trans, bb2img_trans, rot, do_flip, color_scale = augmentation_contact(orig_img.copy(), right_hand_bbox, 'test', enforce_flip=False)
+    # Convert to model input format
+    if args.backbone in ['handoccnet'] or 'resnet' in cfg.MODEL.backbone_type or 'hrnet' in cfg.MODEL.backbone_type:
+        from torchvision import transforms
+        img_tensor = transforms.ToTensor()(crop_img.astype(np.float32) / 255.0)
+    elif args.backbone in ['hamer'] or 'vit' in cfg.MODEL.backbone_type:
+        from torchvision.transforms import Normalize
+        normalize = Normalize(mean=cfg.MODEL.img_mean, std=cfg.MODEL.img_std)
+        img_tensor = crop_img.transpose(2, 0, 1) / 255.0
+        img_tensor = normalize(torch.from_numpy(img_tensor)).float()
+    else:
+        raise NotImplementedError(f"Unsupported backbone: {args.backbone}")
+    ############# Run model #############
+    with torch.no_grad():
+        outputs = model({'input': {'image': img_tensor[None].to(device)}}, mode="test")
+    ############# Run model #############
+    # Save result
+    os.makedirs('outputs', exist_ok=True)
+    os.makedirs('outputs/detection', exist_ok=True)
+    os.makedirs('outputs/crop_img', exist_ok=True)
+    os.makedirs('outputs/contact', exist_ok=True)
+    cv2.imwrite(f'outputs/detection/{frame_name_base}.png', cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR))
+    cv2.imwrite(f'outputs/crop_img/{frame_name_base}.png', crop_img[..., ::-1])
+    eval_thres = get_contact_thres(args.backbone)
+    contact_mask = (outputs['contact_out'][0] > eval_thres).detach().cpu().numpy()
+    contact_mask = remove_small_contact_components(contact_mask, faces=mano.watertight_face['right'], min_size=20)
+    contact_rendered = contact_renderer.render_contact(crop_img[..., ::-1], contact_mask)
+    cv2.imwrite(f'outputs/contact/{frame_name_base}.png', contact_rendered)
+############################### Demo Loop ###############################

demo_video.py ADDED Viewed

	@@ -0,0 +1,132 @@

+import os
+import cv2
+import torch
+import argparse
+import numpy as np
+from tqdm import tqdm
+import mediapipe as mp
+from mediapipe.tasks.python import vision
+from mediapipe.tasks.python import BaseOptions
+from lib.core.config import cfg, update_config
+from lib.models.model import HACO
+from lib.utils.human_models import mano
+from lib.utils.contact_utils import get_contact_thres
+from lib.utils.vis_utils import ContactRenderer, draw_landmarks_on_image
+from lib.utils.preprocessing import augmentation_contact
+from lib.utils.demo_utils import smooth_bbox, smooth_contact_mask, remove_small_contact_components, initialize_video_writer, extract_frames_with_hand, find_longest_continuous_segment
+parser = argparse.ArgumentParser(description='Demo HACO')
+parser.add_argument('--backbone', type=str, default='hamer', choices=['hamer', 'vit-l-16', 'vit-b-16', 'vit-s-16', 'handoccnet', 'hrnet-w48', 'hrnet-w32', 'resnet-152', 'resnet-101', 'resnet-50', 'resnet-34', 'resnet-18'], help='backbone model')
+parser.add_argument('--checkpoint', type=str, default='', help='model path for demo')
+parser.add_argument('--input_path', type=str, default='asset/example_videos', help='video path for demo')
+args = parser.parse_args()
+# Set device as CUDA
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+# Initialize directories
+experiment_dir = 'experiments_demo_video'
+# Load config
+update_config(backbone_type=args.backbone, exp_dir=experiment_dir)
+# Initialize renderer
+contact_renderer = ContactRenderer()
+# Load demo videos
+input_dir = args.input_path
+video_files = [f for f in os.listdir(input_dir) if f.lower().endswith(('.mp4', '.avi', '.mov'))]
+# Initialize MediaPipe HandLandmarker
+base_options = BaseOptions(model_asset_path=cfg.MODEL.hand_landmarker_path)
+hand_options = vision.HandLandmarkerOptions(base_options=base_options, num_hands=2)
+detector = vision.HandLandmarker.create_from_options(hand_options)
+############# Model #############
+model = HACO().to(device)
+model.eval()
+############# Model #############
+# Load model checkpoint if provided
+if args.checkpoint:
+    checkpoint = torch.load(args.checkpoint, map_location=device)
+    model.load_state_dict(checkpoint['state_dict'])
+############################### Demo Loop ###############################
+for i, video_name in tqdm(enumerate(video_files), total=len(video_files)):
+    print(f"Processing: {video_name}")
+    # Organize input and output path
+    video_path = os.path.join(input_dir, video_name)
+    os.makedirs("outputs_video", exist_ok=True)
+    output_path = os.path.join("outputs_video", f"{os.path.splitext(video_name)[0]}_out.mp4")
+    # Load and convert video
+    cap = cv2.VideoCapture(video_path)
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    fps = 30 if fps == 0 or np.isnan(fps) else fps
+    # Extract meaningful video segment
+    frames_with_hand = extract_frames_with_hand(cap, detector)
+    longest_segment = find_longest_continuous_segment(frames_with_hand)
+    if not longest_segment:
+        print(f"No hand detected in any continuous segment for {video_name}")
+        continue
+    writer = None
+    smoothed_bbox = None
+    smoothed_contact = None
+    for _, frame, bbox in longest_segment:
+        # Image preprocessing
+        smoothed_bbox = smooth_bbox(smoothed_bbox, bbox, alpha=0.8)
+        orig_img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+        crop_img, img2bb_trans, bb2img_trans, rot, do_flip, color_scale = augmentation_contact(orig_img.copy(), smoothed_bbox, 'test', enforce_flip=False, bkg_color='white')
+        # Convert to model input format
+        if args.backbone in ['handoccnet'] or 'resnet' in cfg.MODEL.backbone_type or 'hrnet' in cfg.MODEL.backbone_type:
+            from torchvision import transforms
+            img_tensor = transforms.ToTensor()(crop_img.astype(np.float32) / 255.0)
+        elif args.backbone in ['hamer'] or 'vit' in cfg.MODEL.backbone_type:
+            from torchvision.transforms import Normalize
+            normalize = Normalize(mean=cfg.MODEL.img_mean, std=cfg.MODEL.img_std)
+            img_tensor = crop_img.transpose(2, 0, 1) / 255.0
+            img_tensor = normalize(torch.from_numpy(img_tensor)).float()
+        else:
+            raise NotImplementedError(f"Unsupported backbone: {args.backbone}")
+        ############# Run model #############
+        with torch.no_grad():
+            outputs = model({'input': {'image': img_tensor[None].to(device)}}, mode="test")
+        ############# Run model #############
+        # Save result
+        eval_thres = get_contact_thres(args.backbone)
+        raw_contact = (outputs['contact_out'][0] > eval_thres).detach().cpu().numpy()
+        smoothed_contact = smooth_contact_mask(smoothed_contact, raw_contact, alpha=0.8)
+        contact_mask = smoothed_contact > 0.5
+        contact_mask = remove_small_contact_components(contact_mask, faces=mano.watertight_face['right'], min_size=20)
+        contact_rendered = contact_renderer.render_contact(crop_img, contact_mask, mode='demo')
+        if writer is None:
+            ch, cw = contact_rendered.shape[:2]
+            writer = initialize_video_writer(output_path, fps, (cw, ch))
+        writer.write(cv2.cvtColor(contact_rendered, cv2.COLOR_RGB2BGR))
+    if writer:
+        writer.release()
+############################### Demo Loop ###############################

lib/core/__pycache__/config.cpython-38.pyc ADDED Viewed

Binary file (3.07 kB). View file

lib/core/__pycache__/logger.cpython-38.pyc ADDED Viewed

Binary file (2.03 kB). View file

lib/core/config.py ADDED Viewed

	@@ -0,0 +1,93 @@

+import os
+import torch
+import numpy as np
+from easydict import EasyDict as edict
+from lib.core.logger import ColorLogger
+from lib.utils.log_utils import init_dirs
+cfg = edict()
+""" Dataset """
+cfg.DATASET = edict()
+cfg.DATASET.train_name = ['ObMan', 'DexYCB', 'HO3D', 'MOW', 'H2O3D', 'HOI4D', 'H2O', 'ARCTIC', 'InterHand26M', 'HIC', 'PROX', 'RICH', 'Decaf', 'Hi4D']
+cfg.DATASET.test_name = 'MOW' # ONLY TEST ONE DATASET AT A TIME
+cfg.DATASET.workers = 2
+cfg.DATASET.random_seed = 314
+cfg.DATASET.ho_bbox_expand_ratio = 1.3
+cfg.DATASET.hand_bbox_expand_ratio = 1.3
+cfg.DATASET.ho_big_bbox_expand_ratio = 2.0
+cfg.DATASET.hand_scene_bbox_expand_ratio = 2.5
+cfg.DATASET.obj_bbox_expand_ratio = 1.5
+""" Model - HMR """
+cfg.MODEL = edict()
+cfg.MODEL.seed = 314
+cfg.MODEL.input_img_shape = (256, 256)
+cfg.MODEL.img_mean = (0.485, 0.456, 0.406)
+cfg.MODEL.img_std = (0.229, 0.224, 0.225)
+# MANO
+cfg.MODEL.human_model_path = 'data/base_data/human_models'
+# Contact
+cfg.MODEL.contact_means_path = 'data/base_data/contact_data/dexycb/contact_means_dexycb.npy'
+# Backbone
+cfg.MODEL.backbone_type = ''
+cfg.MODEL.hamer_backbone_pretrained_path = 'data/base_data/pretrained_models/hamer/hamer.ckpt'
+cfg.MODEL.hrnet_w32_backbone_config_path = 'data/base_data/pretrained_models/hrnet/cls_hrnet_w32_sgd_lr5e-2_wd1e-4_bs32_x100.yaml'
+cfg.MODEL.hrnet_w32_backbone_pretrained_path = 'data/base_data/pretrained_models/hrnet/hrnet_w32-36af842e.pth'
+cfg.MODEL.hrnet_w48_backbone_config_path = 'data/base_data/pretrained_models/hrnet/cls_hrnet_w48_sgd_lr5e-2_wd1e-4_bs32_x100.yaml'
+cfg.MODEL.hrnet_w48_backbone_pretrained_path = 'data/base_data/pretrained_models/hrnet/hrnet_w48-8ef0771d.pth'
+cfg.MODEL.handoccnet_backbone_pretrained_path = 'data/base_data/pretrained_models/handoccnet/snapshot_demo.pth.tar'
+# Multi-level joint regressor
+cfg.MODEL.V_regressor_336_path = 'data/base_data/human_models/mano/V_regressor_336.npy'
+cfg.MODEL.V_regressor_84_path = 'data/base_data/human_models/mano/V_regressor_84.npy'
+# Hand Detector
+cfg.MODEL.hand_landmarker_path = 'data/base_data/demo_data/hand_landmarker.task'
+""" Train Detail """
+cfg.TRAIN = edict()
+cfg.TRAIN.batch = 24
+cfg.TRAIN.epoch = 10
+cfg.TRAIN.lr = 1e-5
+cfg.TRAIN.weight_decay = 0.0001
+cfg.TRAIN.milestones = (5, 10)
+cfg.TRAIN.step_size = 10
+cfg.TRAIN.gamma = 0.9
+cfg.TRAIN.betas = (0.9, 0.95)
+cfg.TRAIN.print_freq = 5
+cfg.TRAIN.loss_weight = 1.0
+""" Test Detail """
+cfg.TEST = edict()
+cfg.TEST.batch = 1
+""" CAMERA """
+cfg.CAMERA = edict()
+np.random.seed(cfg.DATASET.random_seed)
+torch.manual_seed(cfg.DATASET.random_seed)
+torch.backends.cudnn.benchmark = True
+logger = None
+def update_config(backbone_type='', exp_dir='', ckpt_path=''):
+    if backbone_type == '':
+        backbone_type = 'hamer'
+    cfg.MODEL.backbone_type = backbone_type
+    global logger
+    log_dir = os.path.join(exp_dir, 'log')
+    try:
+        init_dirs([log_dir])
+        logger = ColorLogger(log_dir)
+        logger.info("Logger initialized successfully!")
+    except Exception as e:
+        print(f"Failed to initialize logger: {e}")
+        logger = None

lib/core/logger.py ADDED Viewed

	@@ -0,0 +1,55 @@

+import logging
+import os.path as osp
+import warnings
+warnings.filterwarnings("ignore")
+OK = '\033[92m'
+WARNING = '\033[93m'
+FAIL = '\033[91m'
+END = '\033[0m'
+PINK = '\033[95m'
+BLUE = '\033[94m'
+GREEN = OK
+RED = FAIL
+WHITE = END
+YELLOW = WARNING
+class ColorLogger():
+    def __init__(self, log_dir, log_name='log.txt'):
+        # set log
+        self._logger = logging.getLogger(log_name)
+        self._logger.setLevel(logging.INFO)
+        log_file = osp.join(log_dir, log_name)
+        file_log = logging.FileHandler(log_file, mode='a')
+        file_log.setLevel(logging.INFO)
+        console_log = logging.StreamHandler()
+        console_log.setLevel(logging.INFO)
+        file_formatter = logging.Formatter(
+            "%(asctime)s %(message)s",
+            "%m-%d %H:%M:%S")
+        console_formatter = logging.Formatter(
+            "{}%(asctime)s{} %(message)s".format(GREEN, END),
+            "%m-%d %H:%M:%S")
+        file_log.setFormatter(file_formatter)
+        console_log.setFormatter(console_formatter)
+        self._logger.addHandler(file_log)
+        self._logger.addHandler(console_log)
+    def debug(self, msg):
+        self._logger.debug(str(msg))
+    def info(self, msg):
+        self._logger.info(str(msg))
+    def warning(self, msg):
+        self._logger.warning(WARNING + 'WRN: ' + str(msg) + END)
+    def critical(self, msg):
+        self._logger.critical(RED + 'CRI: ' + str(msg) + END)
+    def error(self, msg):
+        self._logger.error(RED + 'ERR: ' + str(msg) + END)

lib/models/__pycache__/model.cpython-38.pyc ADDED Viewed

Binary file (3.75 kB). View file

lib/models/backbone/__pycache__/backbone_hamer_style.cpython-38.pyc ADDED Viewed

Binary file (9.19 kB). View file

lib/models/backbone/__pycache__/resnet.cpython-38.pyc ADDED Viewed

Binary file (3.02 kB). View file

lib/models/backbone/__pycache__/vit.cpython-38.pyc ADDED Viewed

Binary file (1.33 kB). View file

lib/models/backbone/backbone_hamer_style.py ADDED Viewed

	@@ -0,0 +1,273 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.utils.checkpoint as checkpoint
+import numpy as np
+from functools import partial
+from timm.layers import drop_path, to_2tuple, trunc_normal_
+# This module is from HaMeR (https://github.com/geopavlakos/hamer). Initial configurations follows cfg from their final model.
+class ViT_HaMeR(nn.Module):
+    def __init__(self,
+                 img_size=(256, 192), patch_size=16, in_chans=3, num_classes=80, embed_dim=1280, depth=32,
+                 num_heads=16, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop_rate=0., attn_drop_rate=0.,
+                 drop_path_rate=0.55, hybrid_backbone=None, norm_layer=None, use_checkpoint=False,
+                 frozen_stages=-1, ratio=1, last_norm=True,
+                 patch_padding='pad', freeze_attn=False, freeze_ffn=False,
+                 ):
+        # Protect mutable default arguments
+        super(ViT_HaMeR, self).__init__()
+        norm_layer = norm_layer or partial(nn.LayerNorm, eps=1e-6)
+        self.num_classes = num_classes
+        self.num_features = self.embed_dim = embed_dim  # num_features for consistency with other models
+        self.frozen_stages = frozen_stages
+        self.use_checkpoint = use_checkpoint
+        self.patch_padding = patch_padding
+        self.freeze_attn = freeze_attn
+        self.freeze_ffn = freeze_ffn
+        self.depth = depth
+        if hybrid_backbone is not None:
+            self.patch_embed = HybridEmbed(
+                hybrid_backbone, img_size=img_size, in_chans=in_chans, embed_dim=embed_dim)
+        else:
+            self.patch_embed = PatchEmbed(
+                img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim, ratio=ratio)
+        num_patches = self.patch_embed.num_patches
+        # since the pretraining model has class token
+        self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))
+        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]  # stochastic depth decay rule
+        self.blocks = nn.ModuleList([
+            Block(
+                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
+                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer,
+                )
+            for i in range(depth)])
+        self.last_norm = norm_layer(embed_dim) if last_norm else nn.Identity()
+        if self.pos_embed is not None:
+            trunc_normal_(self.pos_embed, std=.02)
+        self._freeze_stages()
+    def _freeze_stages(self):
+        """Freeze parameters."""
+        if self.frozen_stages >= 0:
+            self.patch_embed.eval()
+            for param in self.patch_embed.parameters():
+                param.requires_grad = False
+        for i in range(1, self.frozen_stages + 1):
+            m = self.blocks[i]
+            m.eval()
+            for param in m.parameters():
+                param.requires_grad = False
+        if self.freeze_attn:
+            for i in range(0, self.depth):
+                m = self.blocks[i]
+                m.attn.eval()
+                m.norm1.eval()
+                for param in m.attn.parameters():
+                    param.requires_grad = False
+                for param in m.norm1.parameters():
+                    param.requires_grad = False
+        if self.freeze_ffn:
+            self.pos_embed.requires_grad = False
+            self.patch_embed.eval()
+            for param in self.patch_embed.parameters():
+                param.requires_grad = False
+            for i in range(0, self.depth):
+                m = self.blocks[i]
+                m.mlp.eval()
+                m.norm2.eval()
+                for param in m.mlp.parameters():
+                    param.requires_grad = False
+                for param in m.norm2.parameters():
+                    param.requires_grad = False
+    def init_weights(self):
+        """Initialize the weights in backbone.
+        Args:
+            pretrained (str, optional): Path to pre-trained weights.
+                Defaults to None.
+        """
+        def _init_weights(m):
+            if isinstance(m, nn.Linear):
+                trunc_normal_(m.weight, std=.02)
+                if isinstance(m, nn.Linear) and m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.LayerNorm):
+                nn.init.constant_(m.bias, 0)
+                nn.init.constant_(m.weight, 1.0)
+        self.apply(_init_weights)
+    def get_num_layers(self):
+        return len(self.blocks)
+    @torch.jit.ignore
+    def no_weight_decay(self):
+        return {'pos_embed', 'cls_token'}
+    def forward_features(self, x):
+        B, C, H, W = x.shape
+        x, (Hp, Wp) = self.patch_embed(x)
+        if self.pos_embed is not None:
+            # fit for multiple GPU training
+            # since the first element for pos embed (sin-cos manner) is zero, it will cause no difference
+            x = x + self.pos_embed[:, 1:] + self.pos_embed[:, :1]
+        for blk in self.blocks:
+            if self.use_checkpoint:
+                x = checkpoint.checkpoint(blk, x)
+            else:
+                x = blk(x)
+        x = self.last_norm(x)
+        xp = x.permute(0, 2, 1).reshape(B, -1, Hp, Wp).contiguous()
+        return xp
+    def forward(self, x):
+        x = x[:,:,:,32:-32] # This is revised from HaMeR code so that this process is done within backbone, not in model.py (follows HaMeR model code)
+        x = self.forward_features(x)
+        return x
+    def train(self, mode=True):
+        """Convert the model into training mode."""
+        super().train(mode)
+        self._freeze_stages()
+class Attention_for_vit(nn.Module):
+    def __init__(
+            self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.,
+            proj_drop=0., attn_head_dim=None,):
+        super().__init__()
+        self.num_heads = num_heads
+        head_dim = dim // num_heads
+        self.dim = dim
+        if attn_head_dim is not None:
+            head_dim = attn_head_dim
+        all_head_dim = head_dim * self.num_heads
+        self.scale = qk_scale or head_dim ** -0.5
+        self.qkv = nn.Linear(dim, all_head_dim * 3, bias=qkv_bias)
+        self.attn_drop = nn.Dropout(attn_drop)
+        self.proj = nn.Linear(all_head_dim, dim)
+        self.proj_drop = nn.Dropout(proj_drop)
+    def forward(self, x):
+        B, N, C = x.shape
+        qkv = self.qkv(x)
+        qkv = qkv.reshape(B, N, 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
+        q, k, v = qkv[0], qkv[1], qkv[2]   # make torchscript happy (cannot use tensor as tuple)
+        q = q * self.scale
+        attn = (q @ k.transpose(-2, -1))
+        attn = attn.softmax(dim=-1)
+        attn = self.attn_drop(attn)
+        x = (attn @ v).transpose(1, 2).reshape(B, N, -1)
+        x = self.proj(x)
+        x = self.proj_drop(x)
+        return x
+class Mlp(nn.Module):
+    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
+        super().__init__()
+        out_features = out_features or in_features
+        hidden_features = hidden_features or in_features
+        self.fc1 = nn.Linear(in_features, hidden_features)
+        self.act = act_layer()
+        self.fc2 = nn.Linear(hidden_features, out_features)
+        self.drop = nn.Dropout(drop)
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.fc2(x)
+        x = self.drop(x)
+        return x
+class DropPath(nn.Module):
+    """Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
+    """
+    def __init__(self, drop_prob=None):
+        super(DropPath, self).__init__()
+        self.drop_prob = drop_prob
+    def forward(self, x):
+        return drop_path(x, self.drop_prob, self.training)
+    def extra_repr(self):
+        return 'p={}'.format(self.drop_prob)
+class Block(nn.Module):
+    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None,
+                 drop=0., attn_drop=0., drop_path=0., act_layer=nn.GELU,
+                 norm_layer=nn.LayerNorm, attn_head_dim=None
+                 ):
+        super().__init__()
+        self.norm1 = norm_layer(dim)
+        self.attn = Attention_for_vit(
+            dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,
+            attn_drop=attn_drop, proj_drop=drop, attn_head_dim=attn_head_dim
+            )
+        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
+        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
+        self.norm2 = norm_layer(dim)
+        mlp_hidden_dim = int(dim * mlp_ratio)
+        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
+    def forward(self, x):
+        x = x + self.drop_path(self.attn(self.norm1(x)))
+        x = x + self.drop_path(self.mlp(self.norm2(x)))
+        return x
+class PatchEmbed(nn.Module):
+    """ Image to Patch Embedding
+    """
+    def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768, ratio=1):
+        super().__init__()
+        img_size = to_2tuple(img_size)
+        patch_size = to_2tuple(patch_size)
+        num_patches = (img_size[1] // patch_size[1]) * (img_size[0] // patch_size[0]) * (ratio ** 2)
+        self.patch_shape = (int(img_size[0] // patch_size[0] * ratio), int(img_size[1] // patch_size[1] * ratio))
+        self.origin_patch_shape = (int(img_size[0] // patch_size[0]), int(img_size[1] // patch_size[1]))
+        self.img_size = img_size
+        self.patch_size = patch_size
+        self.num_patches = num_patches
+        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=(patch_size[0] // ratio), padding=4 + 2 * (ratio//2-1))
+    def forward(self, x, **kwargs):
+        B, C, H, W = x.shape
+        x = self.proj(x)
+        Hp, Wp = x.shape[2], x.shape[3]
+        x = x.flatten(2).transpose(1, 2)
+        return x, (Hp, Wp)

lib/models/backbone/fpn.py ADDED Viewed

	@@ -0,0 +1,282 @@

+# This code is from HandOccNet (https://github.com/namepllet/HandOccNet)
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.utils.model_zoo as model_zoo
+class BasicConv(nn.Module):
+    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True, bn=True, bias=False):
+        super(BasicConv, self).__init__()
+        self.out_channels = out_planes
+        self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=bias)
+        self.bn = nn.BatchNorm2d(out_planes,eps=1e-5, momentum=0.01, affine=True) if bn else None
+        self.relu = nn.ReLU() if relu else None
+    def forward(self, x):
+        x = self.conv(x)
+        if self.bn is not None:
+            x = self.bn(x)
+        if self.relu is not None:
+            x = self.relu(x)
+        return x
+class Flatten(nn.Module):
+    def forward(self, x):
+        return x.view(x.size(0), -1)
+class ChannelGate(nn.Module):
+    def __init__(self, gate_channels, reduction_ratio=16, pool_types=['avg', 'max']):
+        super(ChannelGate, self).__init__()
+        self.gate_channels = gate_channels
+        self.mlp = nn.Sequential(
+            Flatten(),
+            nn.Linear(gate_channels, gate_channels // reduction_ratio),
+            nn.ReLU(),
+            nn.Linear(gate_channels // reduction_ratio, gate_channels)
+            )
+        self.pool_types = pool_types
+    def forward(self, x):
+        channel_att_sum = None
+        for pool_type in self.pool_types:
+            if pool_type=='avg':
+                avg_pool = F.avg_pool2d( x, (x.size(2), x.size(3)), stride=(x.size(2), x.size(3)))
+                channel_att_raw = self.mlp( avg_pool )
+            elif pool_type=='max':
+                max_pool = F.max_pool2d( x, (x.size(2), x.size(3)), stride=(x.size(2), x.size(3)))
+                channel_att_raw = self.mlp( max_pool )
+            elif pool_type=='lp':
+                lp_pool = F.lp_pool2d( x, 2, (x.size(2), x.size(3)), stride=(x.size(2), x.size(3)))
+                channel_att_raw = self.mlp( lp_pool )
+            elif pool_type=='lse':
+                # LSE pool only
+                lse_pool = logsumexp_2d(x)
+                channel_att_raw = self.mlp( lse_pool )
+            if channel_att_sum is None:
+                channel_att_sum = channel_att_raw
+            else:
+                channel_att_sum = channel_att_sum + channel_att_raw
+        scale = F.sigmoid( channel_att_sum ).unsqueeze(2).unsqueeze(3).expand_as(x)
+        return x * scale
+def logsumexp_2d(tensor):
+    tensor_flatten = tensor.view(tensor.size(0), tensor.size(1), -1)
+    s, _ = torch.max(tensor_flatten, dim=2, keepdim=True)
+    outputs = s + (tensor_flatten - s).exp().sum(dim=2, keepdim=True).log()
+    return outputs
+class ChannelPool(nn.Module):
+    def forward(self, x):
+        return torch.cat( (torch.max(x,1)[0].unsqueeze(1), torch.mean(x,1).unsqueeze(1)), dim=1 )
+class SpatialGate(nn.Module):
+    def __init__(self):
+        super(SpatialGate, self).__init__()
+        kernel_size = 7
+        self.compress = ChannelPool()
+        self.spatial = BasicConv(2, 1, kernel_size, stride=1, padding=(kernel_size-1) // 2, relu=False)
+    def forward(self, x):
+        x_compress = self.compress(x)
+        x_out = self.spatial(x_compress)
+        scale = F.sigmoid(x_out) # broadcasting
+        return x*scale, x*(1-scale)
+class FPN(nn.Module):
+    def __init__(self, pretrained=True):
+        super(FPN, self).__init__()
+        self.in_planes = 64
+        resnet = resnet50(pretrained=pretrained)
+        self.toplayer = nn.Conv2d(2048, 256, kernel_size=1, stride=1, padding=0)  # Reduce channels
+        self.layer0 = nn.Sequential(resnet.conv1, resnet.bn1, resnet.leakyrelu, resnet.maxpool)
+        self.layer1 = nn.Sequential(resnet.layer1)
+        self.layer2 = nn.Sequential(resnet.layer2)
+        self.layer3 = nn.Sequential(resnet.layer3)
+        self.layer4 = nn.Sequential(resnet.layer4)
+        # Smooth layers
+        #self.smooth1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
+        self.smooth2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
+        self.smooth3 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
+        # Lateral layers
+        self.latlayer1 = nn.Conv2d(1024, 256, kernel_size=1, stride=1, padding=0)
+        self.latlayer2 = nn.Conv2d( 512, 256, kernel_size=1, stride=1, padding=0)
+        self.latlayer3 = nn.Conv2d( 256, 256, kernel_size=1, stride=1, padding=0)
+        # Attention Module
+        self.attention_module = SpatialGate()
+        self.pool = nn.AvgPool2d(2, stride=2)
+    def _upsample_add(self, x, y):
+        _, _, H, W = y.size()
+        return F.interpolate(x, size=(H,W), mode='bilinear', align_corners=False) + y
+    def forward(self, x):
+        # Bottom-up
+        c1 = self.layer0(x)
+        c2 = self.layer1(c1)
+        c3 = self.layer2(c2)
+        c4 = self.layer3(c3)
+        c5 = self.layer4(c4)
+        # Top-down
+        p5 = self.toplayer(c5)
+        p4 = self._upsample_add(p5, self.latlayer1(c4))
+        p3 = self._upsample_add(p4, self.latlayer2(c3))
+        p2 = self._upsample_add(p3, self.latlayer3(c2))
+        # Smooth
+        #p4 = self.smooth1(p4)
+        p3 = self.smooth2(p3)
+        p2 = self.smooth3(p2)
+        # Attention
+        p2 = self.pool(p2)
+        primary_feats, secondary_feats = self.attention_module(p2)
+        return primary_feats #, secondary_feats
+class ResNet(nn.Module):
+    def __init__(self, block, layers, num_classes=1000):
+        self.inplanes = 64
+        super(ResNet, self).__init__()
+        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
+        self.bn1 = nn.BatchNorm2d(64)
+        self.leakyrelu = nn.LeakyReLU(inplace=True)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.layer1 = self._make_layer(block, 64, layers[0])
+        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
+        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
+        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
+        self.avgpool = nn.AvgPool2d(7, stride=1)
+        self.fc = nn.Linear(512 * block.expansion, num_classes)
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="leaky_relu")
+            elif isinstance(m, nn.BatchNorm2d):
+                nn.init.constant_(m.weight, 1)
+                nn.init.constant_(m.bias, 0)
+    def _make_layer(self, block, planes, blocks, stride=1):
+        downsample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(self.inplanes, planes * block.expansion,
+                          kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm2d(planes * block.expansion))
+        layers = []
+        layers.append(block(self.inplanes, planes, stride, downsample))
+        self.inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(self.inplanes, planes))
+        return nn.Sequential(*layers)
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.leakyrelu(x)
+        x = self.maxpool(x)
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.layer4(x)
+        x = x.mean(3).mean(2)
+        x = x.view(x.size(0), -1)
+        x = self.fc(x)
+        return x
+def resnet50(pretrained=False, **kwargs):
+    """Constructs a ResNet-50 model Encoder"""
+    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
+    if pretrained:
+        model.load_state_dict(model_zoo.load_url("https://download.pytorch.org/models/resnet50-19c8e357.pth"))
+    return model
+def conv3x3(in_planes, out_planes, stride=1):
+    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False)
+class BasicBlock(nn.Module):
+    expansion = 1
+    def __init__(self, inplanes, planes, stride=1, downsample=None):
+        super(BasicBlock, self).__init__()
+        self.conv1 = conv3x3(inplanes, planes, stride)
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.leakyrelu = nn.LeakyReLU(inplace=True)
+        self.conv2 = conv3x3(planes, planes)
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.downsample = downsample
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.leakyrelu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        if self.downsample is not None:
+            residual = self.downsample(x)
+        out += residual
+        out = self.leakyrelu(out)
+        return out
+class Bottleneck(nn.Module):
+    expansion = 4
+    def __init__(self, inplanes, planes, stride=1, downsample=None):
+        super(Bottleneck, self).__init__()
+        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.conv2 = nn.Conv2d(
+            planes, planes, kernel_size=3, stride=stride, padding=1, bias=False
+        )
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.conv3 = nn.Conv2d(
+            planes, planes * self.expansion, kernel_size=1, bias=False
+        )
+        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
+        self.leakyrelu = nn.LeakyReLU(inplace=True)
+        self.downsample = downsample
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.leakyrelu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.leakyrelu(out)
+        out = self.conv3(out)
+        out = self.bn3(out)
+        if self.downsample is not None:
+            residual = self.downsample(x)
+        out += residual
+        out = self.leakyrelu(out)
+        return out

lib/models/backbone/hrnet.py ADDED Viewed

	@@ -0,0 +1,518 @@

+# ------------------------------------------------------------------------------
+# Copyright (c) Microsoft
+# Licensed under the MIT License.
+# Written by Bin Xiao (Bin.Xiao@microsoft.com)
+# Modified by Ke Sun (sunk@mail.ustc.edu.cn)
+# Modified by Kevin Lin (keli@microsoft.com)
+# ------------------------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import logging
+import numpy as np
+import torch
+import torch.nn as nn
+import torch._utils
+import torch.nn.functional as F
+BN_MOMENTUM = 0.1
+logger = logging.getLogger(__name__)
+def conv3x3(in_planes, out_planes, stride=1):
+    """3x3 convolution with padding"""
+    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
+                     padding=1, bias=False)
+class BasicBlock(nn.Module):
+    expansion = 1
+    def __init__(self, inplanes, planes, stride=1, downsample=None):
+        super(BasicBlock, self).__init__()
+        self.conv1 = conv3x3(inplanes, planes, stride)
+        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
+        self.relu = nn.ReLU(inplace=True)
+        self.conv2 = conv3x3(planes, planes)
+        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
+        self.downsample = downsample
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        if self.downsample is not None:
+            residual = self.downsample(x)
+        out += residual
+        out = self.relu(out)
+        return out
+class Bottleneck(nn.Module):
+    expansion = 4
+    def __init__(self, inplanes, planes, stride=1, downsample=None):
+        super(Bottleneck, self).__init__()
+        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
+        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
+        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
+                               padding=1, bias=False)
+        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
+        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
+                               bias=False)
+        self.bn3 = nn.BatchNorm2d(planes * self.expansion,
+                               momentum=BN_MOMENTUM)
+        self.relu = nn.ReLU(inplace=True)
+        self.downsample = downsample
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+        out = self.conv3(out)
+        out = self.bn3(out)
+        if self.downsample is not None:
+            residual = self.downsample(x)
+        out += residual
+        out = self.relu(out)
+        return out
+class HighResolutionModule(nn.Module):
+    def __init__(self, num_branches, blocks, num_blocks, num_inchannels,
+                 num_channels, fuse_method, multi_scale_output=True):
+        super(HighResolutionModule, self).__init__()
+        self._check_branches(
+            num_branches, blocks, num_blocks, num_inchannels, num_channels)
+        self.num_inchannels = num_inchannels
+        self.fuse_method = fuse_method
+        self.num_branches = num_branches
+        self.multi_scale_output = multi_scale_output
+        self.branches = self._make_branches(
+            num_branches, blocks, num_blocks, num_channels)
+        self.fuse_layers = self._make_fuse_layers()
+        self.relu = nn.ReLU(False)
+    def _check_branches(self, num_branches, blocks, num_blocks,
+                        num_inchannels, num_channels):
+        if num_branches != len(num_blocks):
+            error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format(
+                num_branches, len(num_blocks))
+            logger.error(error_msg)
+            raise ValueError(error_msg)
+        if num_branches != len(num_channels):
+            error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format(
+                num_branches, len(num_channels))
+            logger.error(error_msg)
+            raise ValueError(error_msg)
+        if num_branches != len(num_inchannels):
+            error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format(
+                num_branches, len(num_inchannels))
+            logger.error(error_msg)
+            raise ValueError(error_msg)
+    def _make_one_branch(self, branch_index, block, num_blocks, num_channels,
+                         stride=1):
+        downsample = None
+        if stride != 1 or \
+           self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(self.num_inchannels[branch_index],
+                          num_channels[branch_index] * block.expansion,
+                          kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm2d(num_channels[branch_index] * block.expansion,
+                            momentum=BN_MOMENTUM),
+            )
+        layers = []
+        layers.append(block(self.num_inchannels[branch_index],
+                            num_channels[branch_index], stride, downsample))
+        self.num_inchannels[branch_index] = \
+            num_channels[branch_index] * block.expansion
+        for i in range(1, num_blocks[branch_index]):
+            layers.append(block(self.num_inchannels[branch_index],
+                                num_channels[branch_index]))
+        return nn.Sequential(*layers)
+    def _make_branches(self, num_branches, block, num_blocks, num_channels):
+        branches = []
+        for i in range(num_branches):
+            branches.append(
+                self._make_one_branch(i, block, num_blocks, num_channels))
+        return nn.ModuleList(branches)
+    def _make_fuse_layers(self):
+        if self.num_branches == 1:
+            return None
+        num_branches = self.num_branches
+        num_inchannels = self.num_inchannels
+        fuse_layers = []
+        for i in range(num_branches if self.multi_scale_output else 1):
+            fuse_layer = []
+            for j in range(num_branches):
+                if j > i:
+                    fuse_layer.append(nn.Sequential(
+                        nn.Conv2d(num_inchannels[j],
+                                  num_inchannels[i],
+                                  1,
+                                  1,
+                                  0,
+                                  bias=False),
+                        nn.BatchNorm2d(num_inchannels[i],
+                                       momentum=BN_MOMENTUM),
+                        nn.Upsample(scale_factor=2**(j-i), mode='nearest')))
+                elif j == i:
+                    fuse_layer.append(None)
+                else:
+                    conv3x3s = []
+                    for k in range(i-j):
+                        if k == i - j - 1:
+                            num_outchannels_conv3x3 = num_inchannels[i]
+                            conv3x3s.append(nn.Sequential(
+                                nn.Conv2d(num_inchannels[j],
+                                          num_outchannels_conv3x3,
+                                          3, 2, 1, bias=False),
+                                nn.BatchNorm2d(num_outchannels_conv3x3,
+                                            momentum=BN_MOMENTUM)))
+                        else:
+                            num_outchannels_conv3x3 = num_inchannels[j]
+                            conv3x3s.append(nn.Sequential(
+                                nn.Conv2d(num_inchannels[j],
+                                          num_outchannels_conv3x3,
+                                          3, 2, 1, bias=False),
+                                nn.BatchNorm2d(num_outchannels_conv3x3,
+                                            momentum=BN_MOMENTUM),
+                                nn.ReLU(False)))
+                    fuse_layer.append(nn.Sequential(*conv3x3s))
+            fuse_layers.append(nn.ModuleList(fuse_layer))
+        return nn.ModuleList(fuse_layers)
+    def get_num_inchannels(self):
+        return self.num_inchannels
+    def forward(self, x):
+        if self.num_branches == 1:
+            return [self.branches[0](x[0])]
+        for i in range(self.num_branches):
+            x[i] = self.branches[i](x[i])
+        x_fuse = []
+        for i in range(len(self.fuse_layers)):
+            y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])
+            for j in range(1, self.num_branches):
+                if i == j:
+                    y = y + x[j]
+                else:
+                    y = y + self.fuse_layers[i][j](x[j])
+            x_fuse.append(self.relu(y))
+        return x_fuse
+blocks_dict = {
+    'BASIC': BasicBlock,
+    'BOTTLENECK': Bottleneck
+}
+class HighResolutionNet(nn.Module):
+    def __init__(self, cfg, **kwargs):
+        super(HighResolutionNet, self).__init__()
+        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1,
+                               bias=False)
+        self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
+        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1,
+                               bias=False)
+        self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
+        self.relu = nn.ReLU(inplace=True)
+        self.stage1_cfg = cfg['MODEL']['EXTRA']['STAGE1']
+        num_channels = self.stage1_cfg['NUM_CHANNELS'][0]
+        block = blocks_dict[self.stage1_cfg['BLOCK']]
+        num_blocks = self.stage1_cfg['NUM_BLOCKS'][0]
+        self.layer1 = self._make_layer(block, 64, num_channels, num_blocks)
+        stage1_out_channel = block.expansion*num_channels
+        self.stage2_cfg = cfg['MODEL']['EXTRA']['STAGE2']
+        num_channels = self.stage2_cfg['NUM_CHANNELS']
+        block = blocks_dict[self.stage2_cfg['BLOCK']]
+        num_channels = [
+            num_channels[i] * block.expansion for i in range(len(num_channels))]
+        self.transition1 = self._make_transition_layer(
+            [stage1_out_channel], num_channels)
+        self.stage2, pre_stage_channels = self._make_stage(
+            self.stage2_cfg, num_channels)
+        self.stage3_cfg = cfg['MODEL']['EXTRA']['STAGE3']
+        num_channels = self.stage3_cfg['NUM_CHANNELS']
+        block = blocks_dict[self.stage3_cfg['BLOCK']]
+        num_channels = [
+            num_channels[i] * block.expansion for i in range(len(num_channels))]
+        self.transition2 = self._make_transition_layer(
+            pre_stage_channels, num_channels)
+        self.stage3, pre_stage_channels = self._make_stage(
+            self.stage3_cfg, num_channels)
+        self.stage4_cfg = cfg['MODEL']['EXTRA']['STAGE4']
+        num_channels = self.stage4_cfg['NUM_CHANNELS']
+        block = blocks_dict[self.stage4_cfg['BLOCK']]
+        num_channels = [
+            num_channels[i] * block.expansion for i in range(len(num_channels))]
+        self.transition3 = self._make_transition_layer(
+            pre_stage_channels, num_channels)
+        self.stage4, pre_stage_channels = self._make_stage(
+            self.stage4_cfg, num_channels, multi_scale_output=True)
+        # Classification Head
+        self.incre_modules, self.downsamp_modules, \
+            self.final_layer = self._make_head(pre_stage_channels)
+        self.classifier = nn.Linear(2048, 1000)
+    def _make_head(self, pre_stage_channels):
+        head_block = Bottleneck
+        head_channels = [32, 64, 128, 256]
+        # Increasing the #channels on each resolution
+        # from C, 2C, 4C, 8C to 128, 256, 512, 1024
+        incre_modules = []
+        for i, channels  in enumerate(pre_stage_channels):
+            incre_module = self._make_layer(head_block,
+                                            channels,
+                                            head_channels[i],
+                                            1,
+                                            stride=1)
+            incre_modules.append(incre_module)
+        incre_modules = nn.ModuleList(incre_modules)
+        # downsampling modules
+        downsamp_modules = []
+        for i in range(len(pre_stage_channels)-1):
+            in_channels = head_channels[i] * head_block.expansion
+            out_channels = head_channels[i+1] * head_block.expansion
+            downsamp_module = nn.Sequential(
+                nn.Conv2d(in_channels=in_channels,
+                          out_channels=out_channels,
+                          kernel_size=3,
+                          stride=2,
+                          padding=1),
+                nn.BatchNorm2d(out_channels, momentum=BN_MOMENTUM),
+                nn.ReLU(inplace=True)
+            )
+            downsamp_modules.append(downsamp_module)
+        downsamp_modules = nn.ModuleList(downsamp_modules)
+        final_layer = nn.Sequential(
+            nn.Conv2d(
+                in_channels=head_channels[3] * head_block.expansion,
+                out_channels=2048,
+                kernel_size=1,
+                stride=1,
+                padding=0
+            ),
+            nn.BatchNorm2d(2048, momentum=BN_MOMENTUM),
+            nn.ReLU(inplace=True)
+        )
+        return incre_modules, downsamp_modules, final_layer
+    def _make_transition_layer(
+            self, num_channels_pre_layer, num_channels_cur_layer):
+        num_branches_cur = len(num_channels_cur_layer)
+        num_branches_pre = len(num_channels_pre_layer)
+        transition_layers = []
+        for i in range(num_branches_cur):
+            if i < num_branches_pre:
+                if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
+                    transition_layers.append(nn.Sequential(
+                        nn.Conv2d(num_channels_pre_layer[i],
+                                  num_channels_cur_layer[i],
+                                  3,
+                                  1,
+                                  1,
+                                  bias=False),
+                        nn.BatchNorm2d(
+                            num_channels_cur_layer[i], momentum=BN_MOMENTUM),
+                        nn.ReLU(inplace=True)))
+                else:
+                    transition_layers.append(None)
+            else:
+                conv3x3s = []
+                for j in range(i+1-num_branches_pre):
+                    inchannels = num_channels_pre_layer[-1]
+                    outchannels = num_channels_cur_layer[i] \
+                        if j == i-num_branches_pre else inchannels
+                    conv3x3s.append(nn.Sequential(
+                        nn.Conv2d(
+                            inchannels, outchannels, 3, 2, 1, bias=False),
+                        nn.BatchNorm2d(outchannels, momentum=BN_MOMENTUM),
+                        nn.ReLU(inplace=True)))
+                transition_layers.append(nn.Sequential(*conv3x3s))
+        return nn.ModuleList(transition_layers)
+    def _make_layer(self, block, inplanes, planes, blocks, stride=1):
+        downsample = None
+        if stride != 1 or inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(inplanes, planes * block.expansion,
+                          kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
+            )
+        layers = []
+        layers.append(block(inplanes, planes, stride, downsample))
+        inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(inplanes, planes))
+        return nn.Sequential(*layers)
+    def _make_stage(self, layer_config, num_inchannels,
+                    multi_scale_output=True):
+        num_modules = layer_config['NUM_MODULES']
+        num_branches = layer_config['NUM_BRANCHES']
+        num_blocks = layer_config['NUM_BLOCKS']
+        num_channels = layer_config['NUM_CHANNELS']
+        block = blocks_dict[layer_config['BLOCK']]
+        fuse_method = layer_config['FUSE_METHOD']
+        modules = []
+        for i in range(num_modules):
+            # multi_scale_output is only used last module
+            if not multi_scale_output and i == num_modules - 1:
+                reset_multi_scale_output = False
+            else:
+                reset_multi_scale_output = True
+            modules.append(
+                HighResolutionModule(num_branches,
+                                      block,
+                                      num_blocks,
+                                      num_inchannels,
+                                      num_channels,
+                                      fuse_method,
+                                      reset_multi_scale_output)
+            )
+            num_inchannels = modules[-1].get_num_inchannels()
+        return nn.Sequential(*modules), num_inchannels
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.conv2(x)
+        x = self.bn2(x)
+        x = self.relu(x)
+        x = self.layer1(x)
+        x_list = []
+        for i in range(self.stage2_cfg['NUM_BRANCHES']):
+            if self.transition1[i] is not None:
+                x_list.append(self.transition1[i](x))
+            else:
+                x_list.append(x)
+        y_list = self.stage2(x_list)
+        x_list = []
+        for i in range(self.stage3_cfg['NUM_BRANCHES']):
+            if self.transition2[i] is not None:
+                x_list.append(self.transition2[i](y_list[-1]))
+            else:
+                x_list.append(y_list[i])
+        y_list = self.stage3(x_list)
+        x_list = []
+        for i in range(self.stage4_cfg['NUM_BRANCHES']):
+            if self.transition3[i] is not None:
+                x_list.append(self.transition3[i](y_list[-1]))
+            else:
+                x_list.append(y_list[i])
+        y_list = self.stage4(x_list)
+        # Classification Head
+        y = self.incre_modules[0](y_list[0])
+        for i in range(len(self.downsamp_modules)):
+            y = self.incre_modules[i+1](y_list[i+1]) + \
+                        self.downsamp_modules[i](y)
+        y = self.final_layer(y)
+        # if torch._C._get_tracing_state():
+        #     y = y.flatten(start_dim=2).mean(dim=2)
+        # else:
+        #     y = F.avg_pool2d(y, kernel_size=y.size()
+        #                          [2:]).view(y.size(0), -1)
+        # y = self.classifier(y)
+        return y
+    def init_weights(self, pretrained='',):
+        logger.info('=> init weights from normal distribution')
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(
+                    m.weight, mode='fan_out', nonlinearity='relu')
+            elif isinstance(m, nn.BatchNorm2d):
+                nn.init.constant_(m.weight, 1)
+                nn.init.constant_(m.bias, 0)
+        if os.path.isfile(pretrained):
+            pretrained_dict = torch.load(pretrained)
+            logger.info('=> loading pretrained model {}'.format(pretrained))
+            print('=> loading pretrained model {}'.format(pretrained))
+            model_dict = self.state_dict()
+            pretrained_dict = {k: v for k, v in pretrained_dict.items()
+                               if k in model_dict.keys()}
+            # for k, _ in pretrained_dict.items():
+            #     logger.info(
+            #         '=> loading {} pretrained model {}'.format(k, pretrained))
+            #     print('=> loading {} pretrained model {}'.format(k, pretrained))
+            model_dict.update(pretrained_dict)
+            self.load_state_dict(model_dict)
+        # code.interact(local=locals())
+def get_cls_net(config, pretrained, **kwargs):
+    model = HighResolutionNet(config, **kwargs)
+    model.init_weights(pretrained=pretrained)
+    return model

lib/models/backbone/resnet.py ADDED Viewed

	@@ -0,0 +1,95 @@

+# This code is from HandOccNet (https://github.com/mks0601/Hand4Whole_RELEASE/blob/main/common/nets/resnet.py)
+import torch
+import torch.nn as nn
+from torchvision.models.resnet import BasicBlock, Bottleneck
+class ResNetBackbone(nn.Module):
+    def __init__(self, resnet_type):
+        resnet_spec = {18: (BasicBlock, [2, 2, 2, 2], [64, 64, 128, 256, 512], 'resnet18'),
+		       34: (BasicBlock, [3, 4, 6, 3], [64, 64, 128, 256, 512], 'resnet34'),
+		       50: (Bottleneck, [3, 4, 6, 3], [64, 256, 512, 1024, 2048], 'resnet50'),
+		       101: (Bottleneck, [3, 4, 23, 3], [64, 256, 512, 1024, 2048], 'resnet101'),
+		       152: (Bottleneck, [3, 8, 36, 3], [64, 256, 512, 1024, 2048], 'resnet152')}
+        block, layers, channels, name = resnet_spec[resnet_type]
+        self.name = name
+        self.inplanes = 64
+        super(ResNetBackbone, self).__init__()
+        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
+                               bias=False)
+        self.bn1 = nn.BatchNorm2d(64)
+        self.relu = nn.ReLU(inplace=True)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.layer1 = self._make_layer(block, 64, layers[0])
+        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
+        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
+        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.normal_(m.weight, mean=0, std=0.001)
+            elif isinstance(m, nn.BatchNorm2d):
+                nn.init.constant_(m.weight, 1)
+                nn.init.constant_(m.bias, 0)
+    def _make_layer(self, block, planes, blocks, stride=1):
+        downsample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(self.inplanes, planes * block.expansion,
+                          kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm2d(planes * block.expansion),
+            )
+        layers = []
+        layers.append(block(self.inplanes, planes, stride, downsample))
+        self.inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(self.inplanes, planes))
+        return nn.Sequential(*layers)
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.maxpool(x)
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.layer4(x)
+        return x
+    def init_weights(self):
+        import torchvision.models as models
+        if self.name == 'resnet18':
+            org_resnet = models.resnet18(pretrained=True)
+        elif self.name == 'resnet34':
+            org_resnet = models.resnet34(pretrained=True)
+        elif self.name == 'resnet50':
+            org_resnet = models.resnet50(pretrained=True)
+        elif self.name == 'resnet101':
+            org_resnet = models.resnet101(pretrained=True)
+        elif self.name == 'resnet152':
+            org_resnet = models.resnet152(pretrained=True)
+        else:
+            raise ValueError(f"Unsupported model name: {self.name}")
+        # Drop the original fully connected layer
+        org_resnet.fc = None  # Or you can set it to nn.Identity()
+        # If you're loading weights manually, extract the state_dict
+        org_resnet_state = org_resnet.state_dict()
+        # Remove FC layer weights to avoid mismatch
+        org_resnet_state.pop('fc.weight', None)
+        org_resnet_state.pop('fc.bias', None)
+        # Load into your model
+        self.load_state_dict(org_resnet_state, strict=False)
+        print("Initialized ResNet from torchvision with pretrained=True")

lib/models/backbone/vit.py ADDED Viewed

	@@ -0,0 +1,33 @@

+import timm
+import torch.nn as nn
+class ViTBackbone(nn.Module):
+    def __init__(self, model_name='vit_base_patch16_224', pretrained=True, return_cls=False):
+        """
+        Args:
+            model_name (str): 'vit_base_patch16_224' or 'vit_large_patch16_224'
+            pretrained (bool): load pretrained weights from timm
+            return_cls (bool): if True, return CLS token instead of patch tokens
+        """
+        super().__init__()
+        self.return_cls = return_cls
+        # Load model with no classification head
+        self.vit = timm.create_model(model_name, pretrained=pretrained, num_classes=0)
+        # Get dimensions
+        self.embed_dim = self.vit.embed_dim  # 768 for B/16, 1024 for L/16
+        self.patch_size = self.vit.patch_embed.patch_size
+    def forward(self, x):
+        # Features includes CLS + patch tokens: [B, 1 + N, D]
+        x = self.vit.forward_features(x)
+        if self.return_cls:
+            return x[:, 0]  # [B, D] – CLS token
+        else:
+            patch_tokens = x[:, 1:]  # [B, N, D]
+            B, N, D = patch_tokens.shape
+            H = W = int(N ** 0.5)
+            return patch_tokens.view(B, D, H, W)  # [B, H, W, D]

lib/models/decoder/__pycache__/decoder_hamer_style.cpython-38.pyc ADDED Viewed

Binary file (20.1 kB). View file

lib/models/decoder/decoder_hamer_style.py ADDED Viewed

	@@ -0,0 +1,637 @@

+import pickle
+import numpy as np
+from einops import rearrange
+from inspect import isfunction
+from typing import Callable, Optional
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import smplx
+from smplx.lbs import vertices2joints
+from smplx.utils import MANOOutput, to_tensor
+from smplx.vertex_ids import vertex_ids
+from lib.core.config import cfg
+from lib.utils.human_models import mano
+V_regressor_336 = np.load(cfg.MODEL.V_regressor_336_path)
+V_regressor_84 = np.load(cfg.MODEL.V_regressor_84_path)
+# This function is from HaMeR (https://github.com/geopavlakos/hamer).
+def exists(val):
+    return val is not None
+# This function is from HaMeR (https://github.com/geopavlakos/hamer).
+def default(val, d):
+    if exists(val):
+        return val
+    return d() if isfunction(d) else d
+# This class is from HaMeR (https://github.com/geopavlakos/hamer).
+class Attention(nn.Module):
+    def __init__(self, dim, heads=8, dim_head=64, dropout=0.0):
+        super().__init__()
+        inner_dim = dim_head * heads
+        project_out = not (heads == 1 and dim_head == dim)
+        self.heads = heads
+        self.scale = dim_head**-0.5
+        self.attend = nn.Softmax(dim=-1)
+        self.dropout = nn.Dropout(dropout)
+        self.to_qkv = nn.Linear(dim, inner_dim * 3, bias=False)
+        self.to_out = (
+            nn.Sequential(nn.Linear(inner_dim, dim), nn.Dropout(dropout))
+            if project_out
+            else nn.Identity()
+        )
+    def forward(self, x):
+        qkv = self.to_qkv(x).chunk(3, dim=-1)
+        q, k, v = map(lambda t: rearrange(t, "b n (h d) -> b h n d", h=self.heads), qkv)
+        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale
+        attn = self.attend(dots)
+        attn = self.dropout(attn)
+        out = torch.matmul(attn, v)
+        out = rearrange(out, "b h n d -> b n (h d)")
+        return self.to_out(out)
+# This class is from HaMeR (https://github.com/geopavlakos/hamer).
+class CrossAttention(nn.Module):
+    def __init__(self, dim, context_dim=None, heads=8, dim_head=64, dropout=0.0):
+        super().__init__()
+        inner_dim = dim_head * heads
+        project_out = not (heads == 1 and dim_head == dim)
+        self.heads = heads
+        self.scale = dim_head**-0.5
+        self.attend = nn.Softmax(dim=-1)
+        self.dropout = nn.Dropout(dropout)
+        context_dim = default(context_dim, dim)
+        self.to_kv = nn.Linear(context_dim, inner_dim * 2, bias=False)
+        self.to_q = nn.Linear(dim, inner_dim, bias=False)
+        self.to_out = (
+            nn.Sequential(nn.Linear(inner_dim, dim), nn.Dropout(dropout))
+            if project_out
+            else nn.Identity()
+        )
+    def forward(self, x, context=None):
+        context = default(context, x)
+        k, v = self.to_kv(context).chunk(2, dim=-1)
+        q = self.to_q(x)
+        q, k, v = map(lambda t: rearrange(t, "b n (h d) -> b h n d", h=self.heads), [q, k, v])
+        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale
+        attn = self.attend(dots)
+        attn = self.dropout(attn)
+        out = torch.matmul(attn, v)
+        out = rearrange(out, "b h n d -> b n (h d)")
+        return self.to_out(out)
+# This class is from HaMeR (https://github.com/geopavlakos/hamer).
+class FeedForward(nn.Module):
+    def __init__(self, dim, hidden_dim, dropout=0.0):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(dim, hidden_dim),
+            nn.GELU(),
+            nn.Dropout(dropout),
+            nn.Linear(hidden_dim, dim),
+            nn.Dropout(dropout),
+        )
+    def forward(self, x):
+        return self.net(x)
+# This class is from HaMeR (https://github.com/geopavlakos/hamer).
+class Transformer(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        depth: int,
+        heads: int,
+        dim_head: int,
+        mlp_dim: int,
+        dropout: float = 0.0,
+        norm: str = "layer",
+        norm_cond_dim: int = -1,
+    ):
+        super().__init__()
+        self.layers = nn.ModuleList([])
+        for _ in range(depth):
+            sa = Attention(dim, heads=heads, dim_head=dim_head, dropout=dropout)
+            ff = FeedForward(dim, mlp_dim, dropout=dropout)
+            self.layers.append(
+                nn.ModuleList(
+                    [
+                        PreNorm(dim, sa, norm=norm, norm_cond_dim=norm_cond_dim),
+                        PreNorm(dim, ff, norm=norm, norm_cond_dim=norm_cond_dim),
+                    ]
+                )
+            )
+    def forward(self, x: torch.Tensor, *args):
+        for attn, ff in self.layers:
+            x = attn(x, *args) + x
+            x = ff(x, *args) + x
+        return x
+class AdaptiveLayerNorm1D(torch.nn.Module):
+    def __init__(self, data_dim: int, norm_cond_dim: int):
+        super().__init__()
+        if data_dim <= 0:
+            raise ValueError(f"data_dim must be positive, but got {data_dim}")
+        if norm_cond_dim <= 0:
+            raise ValueError(f"norm_cond_dim must be positive, but got {norm_cond_dim}")
+        self.norm = torch.nn.LayerNorm(
+            data_dim
+        )  # TODO: Check if elementwise_affine=True is correct
+        self.linear = torch.nn.Linear(norm_cond_dim, 2 * data_dim)
+        torch.nn.init.zeros_(self.linear.weight)
+        torch.nn.init.zeros_(self.linear.bias)
+    def forward(self, x: torch.Tensor, t: torch.Tensor) -> torch.Tensor:
+        # x: (batch, ..., data_dim)
+        # t: (batch, norm_cond_dim)
+        # return: (batch, data_dim)
+        x = self.norm(x)
+        alpha, beta = self.linear(t).chunk(2, dim=-1)
+        # Add singleton dimensions to alpha and beta
+        if x.dim() > 2:
+            alpha = alpha.view(alpha.shape[0], *([1] * (x.dim() - 2)), alpha.shape[1])
+            beta = beta.view(beta.shape[0], *([1] * (x.dim() - 2)), beta.shape[1])
+        return x * (1 + alpha) + beta
+def normalization_layer(norm: Optional[str], dim: int, norm_cond_dim: int = -1):
+    if norm == "batch":
+        return torch.nn.BatchNorm1d(dim)
+    elif norm == "layer":
+        return torch.nn.LayerNorm(dim)
+    elif norm == "ada":
+        assert norm_cond_dim > 0, f"norm_cond_dim must be positive, got {norm_cond_dim}"
+        return AdaptiveLayerNorm1D(dim, norm_cond_dim)
+    elif norm is None:
+        return torch.nn.Identity()
+    else:
+        raise ValueError(f"Unknown norm: {norm}")
+class PreNorm(nn.Module):
+    def __init__(self, dim: int, fn: Callable, norm: str = "layer", norm_cond_dim: int = -1):
+        super().__init__()
+        self.norm = normalization_layer(norm, dim, norm_cond_dim)
+        self.fn = fn
+    def forward(self, x: torch.Tensor, *args, **kwargs):
+        if isinstance(self.norm, AdaptiveLayerNorm1D):
+            return self.fn(self.norm(x, *args), **kwargs)
+        else:
+            return self.fn(self.norm(x), **kwargs)
+# This class is from HaMeR (https://github.com/geopavlakos/hamer).
+class TransformerCrossAttn(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        depth: int,
+        heads: int,
+        dim_head: int,
+        mlp_dim: int,
+        dropout: float = 0.0,
+        norm: str = "layer",
+        norm_cond_dim: int = -1,
+        context_dim: Optional[int] = None,
+    ):
+        super().__init__()
+        self.layers = nn.ModuleList([])
+        for _ in range(depth):
+            sa = Attention(dim, heads=heads, dim_head=dim_head, dropout=dropout)
+            ca = CrossAttention(
+                dim, context_dim=context_dim, heads=heads, dim_head=dim_head, dropout=dropout
+            )
+            ff = FeedForward(dim, mlp_dim, dropout=dropout)
+            self.layers.append(
+                nn.ModuleList(
+                    [
+                        PreNorm(dim, sa, norm=norm, norm_cond_dim=norm_cond_dim),
+                        PreNorm(dim, ca, norm=norm, norm_cond_dim=norm_cond_dim),
+                        PreNorm(dim, ff, norm=norm, norm_cond_dim=norm_cond_dim),
+                    ]
+                )
+            )
+    def forward(self, x: torch.Tensor, *args, context=None, context_list=None):
+        if context_list is None:
+            context_list = [context] * len(self.layers)
+        if len(context_list) != len(self.layers):
+            raise ValueError(f"len(context_list) != len(self.layers) ({len(context_list)} != {len(self.layers)})")
+        for i, (self_attn, cross_attn, ff) in enumerate(self.layers):
+            x = self_attn(x, *args) + x
+            x = cross_attn(x, *args, context=context_list[i]) + x
+            x = ff(x, *args) + x
+        return x
+# This class is from HaMeR (https://github.com/geopavlakos/hamer).
+class DropTokenDropout(nn.Module):
+    def __init__(self, p: float = 0.1):
+        super().__init__()
+        if p < 0 or p > 1:
+            raise ValueError(
+                "dropout probability has to be between 0 and 1, " "but got {}".format(p)
+            )
+        self.p = p
+    def forward(self, x: torch.Tensor):
+        # x: (batch_size, seq_len, dim)
+        if self.training and self.p > 0:
+            zero_mask = torch.full_like(x[0, :, 0], self.p).bernoulli().bool()
+            # TODO: permutation idx for each batch using torch.argsort
+            if zero_mask.any():
+                x = x[:, ~zero_mask, :]
+        return x
+# This class is from HaMeR (https://github.com/geopavlakos/hamer).
+class ZeroTokenDropout(nn.Module):
+    def __init__(self, p: float = 0.1):
+        super().__init__()
+        if p < 0 or p > 1:
+            raise ValueError(
+                "dropout probability has to be between 0 and 1, " "but got {}".format(p)
+            )
+        self.p = p
+    def forward(self, x: torch.Tensor):
+        # x: (batch_size, seq_len, dim)
+        if self.training and self.p > 0:
+            zero_mask = torch.full_like(x[:, :, 0], self.p).bernoulli().bool()
+            # Zero-out the masked tokens
+            x[zero_mask, :] = 0
+        return x
+# This class is from HaMeR (https://github.com/geopavlakos/hamer).
+class TransformerDecoder(nn.Module):
+    def __init__(
+        self,
+        num_tokens: int,
+        token_dim: int,
+        dim: int,
+        depth: int,
+        heads: int,
+        mlp_dim: int,
+        dim_head: int = 64,
+        dropout: float = 0.0,
+        emb_dropout: float = 0.0,
+        emb_dropout_type: str = 'drop',
+        norm: str = "layer",
+        norm_cond_dim: int = -1,
+        context_dim: Optional[int] = None,
+        skip_token_embedding: bool = False,
+    ):
+        super().__init__()
+        if not skip_token_embedding:
+            self.to_token_embedding = nn.Linear(token_dim, dim)
+        else:
+            self.to_token_embedding = nn.Identity()
+            if token_dim != dim:
+                raise ValueError(
+                    f"token_dim ({token_dim}) != dim ({dim}) when skip_token_embedding is True"
+                )
+        self.pos_embedding = nn.Parameter(torch.randn(1, num_tokens, dim))
+        if emb_dropout_type == "drop":
+            self.dropout = DropTokenDropout(emb_dropout)
+        elif emb_dropout_type == "zero":
+            self.dropout = ZeroTokenDropout(emb_dropout)
+        elif emb_dropout_type == "normal":
+            self.dropout = nn.Dropout(emb_dropout)
+        self.transformer = TransformerCrossAttn(
+            dim,
+            depth,
+            heads,
+            dim_head,
+            mlp_dim,
+            dropout,
+            norm=norm,
+            norm_cond_dim=norm_cond_dim,
+            context_dim=context_dim,
+        )
+    def forward(self, inp: torch.Tensor, *args, context=None, context_list=None):
+        x = self.to_token_embedding(inp)
+        b, n, _ = x.shape
+        x = self.dropout(x)
+        x += self.pos_embedding[:, :n]
+        x = self.transformer(x, *args, context=context, context_list=context_list)
+        return x
+def rot6d_to_rotmat(x: torch.Tensor) -> torch.Tensor:
+    """
+    Convert 6D rotation representation to 3x3 rotation matrix.
+    Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+    Args:
+        x (torch.Tensor): (B,6) Batch of 6-D rotation representations.
+    Returns:
+        torch.Tensor: Batch of corresponding rotation matrices with shape (B,3,3).
+    """
+    x = x.reshape(-1,2,3).permute(0, 2, 1).contiguous()
+    a1 = x[:, :, 0]
+    a2 = x[:, :, 1]
+    b1 = F.normalize(a1)
+    b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+    b3 = torch.cross(b1, b2)
+    return torch.stack((b1, b2, b3), dim=-1)
+def aa_to_rotmat(theta: torch.Tensor):
+    """
+    Convert axis-angle representation to rotation matrix.
+    Works by first converting it to a quaternion.
+    Args:
+        theta (torch.Tensor): Tensor of shape (B, 3) containing axis-angle representations.
+    Returns:
+        torch.Tensor: Corresponding rotation matrices with shape (B, 3, 3).
+    """
+    norm = torch.norm(theta + 1e-8, p = 2, dim = 1)
+    angle = torch.unsqueeze(norm, -1)
+    normalized = torch.div(theta, angle)
+    angle = angle * 0.5
+    v_cos = torch.cos(angle)
+    v_sin = torch.sin(angle)
+    quat = torch.cat([v_cos, v_sin * normalized], dim = 1)
+    return quat_to_rotmat(quat)
+class MANO(smplx.MANOLayer):
+    def __init__(self, *args, joint_regressor_extra: Optional[str] = None, **kwargs):
+        """
+        Extension of the official MANO implementation to support more joints.
+        Args:
+            Same as MANOLayer.
+            joint_regressor_extra (str): Path to extra joint regressor.
+        """
+        super(MANO, self).__init__(*args, **kwargs)
+        mano_to_openpose = [0, 13, 14, 15, 16, 1, 2, 3, 17, 4, 5, 6, 18, 10, 11, 12, 19, 7, 8, 9, 20]
+        #2, 3, 5, 4, 1
+        if joint_regressor_extra is not None:
+            self.register_buffer('joint_regressor_extra', torch.tensor(pickle.load(open(joint_regressor_extra, 'rb'), encoding='latin1'), dtype=torch.float32))
+        self.register_buffer('extra_joints_idxs', to_tensor(list(vertex_ids['mano'].values()), dtype=torch.long))
+        self.register_buffer('joint_map', torch.tensor(mano_to_openpose, dtype=torch.long))
+    def forward(self, *args, **kwargs) -> MANOOutput:
+        """
+        Run forward pass. Same as MANO and also append an extra set of joints if joint_regressor_extra is specified.
+        """
+        mano_output = super(MANO, self).forward(*args, **kwargs)
+        extra_joints = torch.index_select(mano_output.vertices, 1, self.extra_joints_idxs)
+        joints = torch.cat([mano_output.joints, extra_joints], dim=1)
+        joints = joints[:, self.joint_map, :]
+        if hasattr(self, 'joint_regressor_extra'):
+            extra_joints = vertices2joints(self.joint_regressor_extra, mano_output.vertices)
+            joints = torch.cat([joints, extra_joints], dim=1)
+        mano_output.joints = joints
+        return mano_output
+class MANOTransformerDecoderHead(nn.Module):
+    """ Cross-attention based MANO Transformer decoder
+    """
+    def __init__(self):
+        super().__init__()
+        # self.cfg = cfg
+        self.joint_rep_type = '6d' #cfg.MODEL.MANO_HEAD.get('JOINT_REP', '6d')
+        self.joint_rep_dim = {'6d': 6, 'aa': 3}[self.joint_rep_type]
+        npose = self.joint_rep_dim * (cfg.MODEL.hamer_mano_num_hand_joints + 1)
+        self.npose = npose
+        self.input_is_mean_shape = False #cfg.MODEL.MANO_HEAD.get('TRANSFORMER_INPUT', 'zero') == 'mean_shape'
+        transformer_args = dict(
+            num_tokens=1,
+            token_dim=1,
+            dim=1024,
+        )
+        if cfg.MODEL.backbone_type in ['resnet-50', 'resnet-101', 'resnet-152', 'hrnet-w32', 'hrnet-w48']:
+            context_dim = 2048
+        elif cfg.MODEL.backbone_type in ['vit-l-16']:
+            context_dim = 1024
+        elif cfg.MODEL.backbone_type in ['vit-b-16']:
+            context_dim = 768
+        elif cfg.MODEL.backbone_type in ['resnet-18', 'resnet-34']:
+            context_dim = 512
+        elif cfg.MODEL.backbone_type in ['vit-s-16']:
+            context_dim = 384
+        elif cfg.MODEL.backbone_type in ['handoccnet']:
+            context_dim = 256
+        else:
+            context_dim = 1280
+        # transformer_args = (transformer_args | {'context_dim': 1280, 'depth': 6, 'dim_head': 64, 'dropout': 0.0, 'emb_dropout': 0.0, 'heads': 8, 'mlp_dim': 1024, 'norm': 'layer'})
+        transformer_args = {**transformer_args, 'context_dim': context_dim, 'depth': 6, 'dim_head': 64, 'dropout': 0.0, 'emb_dropout': 0.0, 'heads': 8, 'mlp_dim': 1024, 'norm': 'layer'}
+        self.transformer = TransformerDecoder(
+            **transformer_args
+        )
+        dim=transformer_args['dim']
+        self.decpose = nn.Linear(dim, npose)
+        self.decshape = nn.Linear(dim, 10)
+        self.deccam = nn.Linear(dim, 3)
+        mean_params = np.load(cfg.MODEL.hamer_mano_mean_params)
+        init_hand_pose = torch.from_numpy(mean_params['pose'].astype(np.float32)).unsqueeze(0)
+        init_betas = torch.from_numpy(mean_params['shape'].astype('float32')).unsqueeze(0)
+        init_cam = torch.from_numpy(mean_params['cam'].astype(np.float32)).unsqueeze(0)
+        self.register_buffer('init_hand_pose', init_hand_pose)
+        self.register_buffer('init_betas', init_betas)
+        self.register_buffer('init_cam', init_cam)
+    def forward(self, x, **kwargs):
+        batch_size = x.shape[0]
+        # vit pretrained backbone is channel-first. Change to token-first
+        x = rearrange(x, 'b c h w -> b (h w) c')
+        init_hand_pose = self.init_hand_pose.expand(batch_size, -1)
+        init_betas = self.init_betas.expand(batch_size, -1)
+        init_cam = self.init_cam.expand(batch_size, -1)
+        # TODO: Convert init_hand_pose to aa rep if needed
+        if self.joint_rep_type == 'aa':
+            raise NotImplementedError
+        pred_hand_pose = init_hand_pose
+        pred_betas = init_betas
+        pred_cam = init_cam
+        pred_hand_pose_list = []
+        pred_betas_list = []
+        pred_cam_list = []
+        # Input token to transformer is zero token
+        if self.input_is_mean_shape:
+            token = torch.cat([pred_hand_pose, pred_betas, pred_cam], dim=1)[:,None,:]
+        else:
+            token = torch.zeros(batch_size, 1, 1).to(x.device)
+        # Pass through transformer
+        token_out = self.transformer(token, context=x)
+        token_out = token_out.squeeze(1) # (B, C)
+        # Readout from token_out
+        pred_hand_pose = self.decpose(token_out) + pred_hand_pose
+        pred_betas = self.decshape(token_out) + pred_betas
+        pred_cam = self.deccam(token_out) + pred_cam
+        pred_hand_pose_list.append(pred_hand_pose)
+        pred_betas_list.append(pred_betas)
+        pred_cam_list.append(pred_cam)
+        # Convert self.joint_rep_type -> rotmat
+        joint_conversion_fn = {
+            '6d': rot6d_to_rotmat,
+            'aa': lambda x: aa_to_rotmat(x.view(-1, 3).contiguous())
+        }[self.joint_rep_type]
+        pred_mano_params_list = {}
+        pred_mano_params_list['hand_pose'] = torch.cat([joint_conversion_fn(pbp).view(batch_size, -1, 3, 3)[:, 1:, :, :] for pbp in pred_hand_pose_list], dim=0)
+        pred_mano_params_list['betas'] = torch.cat(pred_betas_list, dim=0)
+        pred_mano_params_list['cam'] = torch.cat(pred_cam_list, dim=0)
+        pred_hand_pose = joint_conversion_fn(pred_hand_pose).view(batch_size, cfg.MODEL.hamer_mano_num_hand_joints+1, 3, 3)
+        pred_mano_params = {'global_orient': pred_hand_pose[:, [0]],
+                            'hand_pose': pred_hand_pose[:, 1:],
+                            'betas': pred_betas}
+        return pred_mano_params, pred_cam, pred_mano_params_list
+def perspective_projection(points: torch.Tensor,
+                           translation: torch.Tensor,
+                           focal_length: torch.Tensor,
+                           camera_center: Optional[torch.Tensor] = None,
+                           rotation: Optional[torch.Tensor] = None) -> torch.Tensor:
+    """
+    Computes the perspective projection of a set of 3D points.
+    Args:
+        points (torch.Tensor): Tensor of shape (B, N, 3) containing the input 3D points.
+        translation (torch.Tensor): Tensor of shape (B, 3) containing the 3D camera translation.
+        focal_length (torch.Tensor): Tensor of shape (B, 2) containing the focal length in pixels.
+        camera_center (torch.Tensor): Tensor of shape (B, 2) containing the camera center in pixels.
+        rotation (torch.Tensor): Tensor of shape (B, 3, 3) containing the camera rotation.
+    Returns:
+        torch.Tensor: Tensor of shape (B, N, 2) containing the projection of the input points.
+    """
+    batch_size = points.shape[0]
+    if rotation is None:
+        rotation = torch.eye(3, device=points.device, dtype=points.dtype).unsqueeze(0).expand(batch_size, -1, -1)
+    if camera_center is None:
+        camera_center = torch.zeros(batch_size, 2, device=points.device, dtype=points.dtype)
+    # Populate intrinsic camera matrix K.
+    K = torch.zeros([batch_size, 3, 3], device=points.device, dtype=points.dtype)
+    K[:,0,0] = focal_length[:,0]
+    K[:,1,1] = focal_length[:,1]
+    K[:,2,2] = 1.
+    K[:,:-1, -1] = camera_center
+    # Transform points
+    points = torch.einsum('bij,bkj->bki', rotation, points)
+    points = points + translation.unsqueeze(1)
+    # Apply perspective distortion
+    projected_points = points / points[:,:,-1].unsqueeze(-1)
+    # Apply camera intrinsics
+    projected_points = torch.einsum('bij,bkj->bki', K, projected_points)
+    return projected_points[:, :, :-1]
+# This module is modified from MANOTransformerDecoderHead of HaMeR (https://github.com/geopavlakos/hamer). All cfg are directly initialized.
+class ContactTransformerDecoderHead(nn.Module):
+    """ Cross-attention based MANO Transformer decoder
+    """
+    def __init__(self):
+        super().__init__()
+        transformer_args = dict(
+            num_tokens=1,
+            token_dim=1,
+            dim=1024,
+        )
+        if cfg.MODEL.backbone_type in ['resnet-50', 'resnet-101', 'resnet-152', 'hrnet-w32', 'hrnet-w48']:
+            context_dim = 2048
+        elif cfg.MODEL.backbone_type in ['vit-l-16']:
+            context_dim = 1024
+        elif cfg.MODEL.backbone_type in ['vit-b-16']:
+            context_dim = 768
+        elif cfg.MODEL.backbone_type in ['resnet-18', 'resnet-34']:
+            context_dim = 512
+        elif cfg.MODEL.backbone_type in ['vit-s-16']:
+            context_dim = 384
+        elif cfg.MODEL.backbone_type in ['handoccnet']:
+            context_dim = 256
+        else:
+            context_dim = 1280
+        MANO_HEAD_TRANSFORMER_DECODER_CONFIG = {'depth': 6, 'heads': 8, 'mlp_dim': 1024, 'dim_head': 64, 'dropout': 0.0, 'emb_dropout': 0.0, 'norm': 'layer', 'context_dim': context_dim}
+        transformer_args.update(dict(MANO_HEAD_TRANSFORMER_DECODER_CONFIG))
+        self.transformer = TransformerDecoder(
+            **transformer_args
+        )
+        self.deccontact = nn.Linear(1024, 778)
+        CONTACT_MEAN_DIR = cfg.MODEL.contact_means_path # TODO: REPLACE THIS WITH CONTACT MEAN OF ENTIRE DATASETS
+        init_contact = nn.Parameter(torch.randn(1, 778, requires_grad=True))
+        self.register_buffer('init_contact', init_contact)
+    def forward(self, x, **kwargs): # x: [b, 1280, 16, 12] (if resnet-50, x: [b, 2048, 8, 8], resnet-34: [b, 512, 8, 8], hrnet-w32: [b, 2048, 8, 8])
+        batch_size = x.shape[0]
+        device = x.device
+        # vit pretrained backbone is channel-first. Change to token-first
+        x = rearrange(x, 'b c h w -> b (h w) c')
+        init_contact = self.init_contact.expand(batch_size, -1)
+        pred_contact = init_contact
+        token = torch.zeros(batch_size, 1, 1).to(x.device)
+        # Pass through transformer
+        token_out = self.transformer(token, context=x) # x: [b, 192, 1280]
+        token_out = token_out[:, 0] # (B, C)
+        # Readout from token_out
+        pred_contact = self.deccontact(token_out) + pred_contact
+        # pred_contact = pred_contact.sigmoid()
+        # Joint contact
+        pred_joint_contact = (torch.tensor(mano.joint_regressor, dtype=torch.float32, device=device) @ pred_contact.T).T
+        pred_mesh_contact_336 = (torch.tensor(V_regressor_336, dtype=torch.float32, device=device) @ pred_contact.T).T
+        pred_mesh_contact_84 = (torch.tensor(V_regressor_84, dtype=torch.float32, device=device) @ pred_contact.T).T
+        return pred_contact, pred_mesh_contact_336, pred_mesh_contact_84, pred_joint_contact

lib/models/model.py ADDED Viewed

	@@ -0,0 +1,100 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from lib.core.config import cfg
+class HACO(nn.Module):
+    def __init__(self):
+        super(HACO, self).__init__()
+        if torch.cuda.is_available():
+            self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            self.to(self.device)
+        # Load modules
+        self.backbone = get_backbone_network(type=cfg.MODEL.backbone_type)
+        self.decoder = get_decoder_network(type=cfg.MODEL.backbone_type)
+    def forward(self, inputs, mode='test'):
+        image = inputs['input']['image'].to(self.device)
+        if 'vit' in cfg.MODEL.backbone_type:
+            image = F.interpolate(image, size=(224, 224), mode='bilinear', align_corners=False)
+        img_feat = self.backbone(image)
+        contact_out, contact_336_out, contact_84_out, contact_joint_out = self.decoder(img_feat)
+        return dict(contact_out=contact_out, contact_336_out=contact_336_out, contact_84_out=contact_84_out, contact_joint_out=contact_joint_out)
+def get_backbone_network(type='hamer'):
+    if type in ['hamer']:
+        from lib.models.backbone.backbone_hamer_style import ViT_HaMeR
+        backbone = ViT_HaMeR()
+        checkpoint = torch.load(cfg.MODEL.hamer_backbone_pretrained_path, map_location='cuda')['state_dict']
+        filtered_state_dict = {k[len("backbone."):]: v for k, v in checkpoint.items() if k.startswith("backbone.")}
+        backbone.load_state_dict(filtered_state_dict)
+    elif type in ['resnet-18']:
+        from lib.models.backbone.resnet import ResNetBackbone
+        backbone = ResNetBackbone(18) # ResNet
+        backbone.init_weights()
+    elif type in ['resnet-34']:
+        from lib.models.backbone.resnet import ResNetBackbone
+        backbone = ResNetBackbone(34) # ResNet
+        backbone.init_weights()
+    elif type in ['resnet-50']:
+        from lib.models.backbone.resnet import ResNetBackbone
+        backbone = ResNetBackbone(50) # ResNet
+        backbone.init_weights()
+    elif type in ['resnet-101']:
+        from lib.models.backbone.resnet import ResNetBackbone
+        backbone = ResNetBackbone(101) # ResNet
+        backbone.init_weights()
+    elif type in ['resnet-152']:
+        from lib.models.backbone.resnet import ResNetBackbone
+        backbone = ResNetBackbone(152) # ResNet
+        backbone.init_weights()
+    elif type in ['hrnet-w32']:
+        from lib.models.backbone.hrnet import HighResolutionNet
+        from lib.utils.func_utils import load_config
+        config = load_config(cfg.MODEL.hrnet_w32_backbone_config_path)
+        pretrained = cfg.MODEL.hrnet_w32_backbone_pretrained_path
+        backbone = HighResolutionNet(config)
+        backbone.init_weights(pretrained=pretrained)
+    elif type in ['hrnet-w48']:
+        from lib.models.backbone.hrnet import HighResolutionNet
+        from lib.utils.func_utils import load_config
+        config = load_config(cfg.MODEL.hrnet_w48_backbone_config_path)
+        pretrained = cfg.MODEL.hrnet_w48_backbone_pretrained_path
+        backbone = HighResolutionNet(config)
+        backbone.init_weights(pretrained=pretrained)
+    elif type in ['handoccnet']:
+        from lib.models.backbone.fpn import FPN
+        backbone = FPN(pretrained=False)
+        pretrained = cfg.MODEL.handoccnet_backbone_pretrained_path
+        state_dict = {k[len('module.backbone.'):]: v for k, v in torch.load(pretrained)['network'].items() if k.startswith('module.backbone.')}
+        backbone.load_state_dict(state_dict, strict=True)
+    elif type in ['vit-s-16']:
+        from lib.models.backbone.vit import ViTBackbone
+        backbone = ViTBackbone(model_name='vit_small_patch16_224', pretrained=True)
+    elif type in ['vit-b-16']:
+        from lib.models.backbone.vit import ViTBackbone
+        backbone = ViTBackbone(model_name='vit_base_patch16_224', pretrained=True)
+    elif type in ['vit-l-16']:
+        from lib.models.backbone.vit import ViTBackbone
+        backbone = ViTBackbone(model_name='vit_large_patch16_224', pretrained=True)
+    else:
+        raise NotImplementedError
+    return backbone
+def get_decoder_network(type='hamer'):
+    from lib.models.decoder.decoder_hamer_style import ContactTransformerDecoderHead
+    decoder = ContactTransformerDecoderHead()
+    return decoder

lib/utils/__pycache__/contact_utils.cpython-38.pyc ADDED Viewed

Binary file (1.3 kB). View file

lib/utils/__pycache__/eval_utils.cpython-38.pyc ADDED Viewed

Binary file (1.17 kB). View file

lib/utils/__pycache__/func_utils.cpython-38.pyc ADDED Viewed

Binary file (2.14 kB). View file

lib/utils/__pycache__/human_models.cpython-38.pyc ADDED Viewed

Binary file (3.61 kB). View file

lib/utils/__pycache__/log_utils.cpython-38.pyc ADDED Viewed

Binary file (471 Bytes). View file

lib/utils/__pycache__/mano_utils.cpython-38.pyc ADDED Viewed

Binary file (4.35 kB). View file

lib/utils/__pycache__/mesh_utils.cpython-38.pyc ADDED Viewed

Binary file (2.01 kB). View file

lib/utils/__pycache__/preprocessing.cpython-38.pyc ADDED Viewed

Binary file (7.88 kB). View file

lib/utils/__pycache__/train_utils.cpython-38.pyc ADDED Viewed

Binary file (679 Bytes). View file

lib/utils/__pycache__/transforms.cpython-38.pyc ADDED Viewed

Binary file (640 Bytes). View file

lib/utils/__pycache__/vis_utils.cpython-38.pyc ADDED Viewed

Binary file (6.27 kB). View file

lib/utils/contact_utils.py ADDED Viewed

	@@ -0,0 +1,55 @@

+import gc
+import torch
+import numpy as np
+from trimesh.proximity import ProximityQuery
+from lib.utils.human_models import mano
+def get_ho_contact_and_offset(mesh_hand, mesh_obj, c_thres):
+    # Make sure that meshes are watertight and do not comntain inverted faces
+    # Typically canonical space meshes are more stable
+    pq = ProximityQuery(mesh_obj)
+    obj_coord_c, dist, obj_coord_c_idx = pq.on_surface(mesh_hand.vertices.astype(np.float32))
+    is_contact_h = (dist < c_thres)
+    contact_h = (1. * is_contact_h).astype(np.float32)
+    contact_valid = np.ones((mano.vertex_num, 1))
+    inter_coord_valid = np.ones((mano.vertex_num))
+    # Explicit cleanup
+    del pq
+    gc.collect()
+    return np.array(contact_h), np.array(obj_coord_c), contact_valid, inter_coord_valid
+def get_contact_thres(backbone_type='hamer'):
+    if backbone_type == 'hamer':
+        return 0.5
+    elif backbone_type == 'vit-l-16':
+        return 0.55
+    elif backbone_type == 'vit-b-16':
+        return 0.5
+    elif backbone_type == 'vit-s-16':
+        return 0.5
+    elif backbone_type == 'handoccnet':
+        return 0.95
+    elif backbone_type == 'hrnet-w48':
+        return 0.5
+    elif backbone_type == 'hrnet-w32':
+        return 0.5
+    elif backbone_type == 'resnet-152':
+        return 0.55
+    elif backbone_type == 'resnet-101':
+        return 0.5
+    elif backbone_type == 'resnet-50':
+        return 0.5
+    elif backbone_type == 'resnet-34':
+        return 0.5
+    elif backbone_type == 'resnet-18':
+        return 0.5
+    else:
+        raise NotImplementedError

lib/utils/demo_utils.py ADDED Viewed

	@@ -0,0 +1,105 @@

+import cv2
+import numpy as np
+from collections import defaultdict, deque
+import mediapipe as mp
+from lib.utils.vis_utils import draw_landmarks_on_image
+def smooth_bbox(prev_bbox, curr_bbox, alpha=0.8):
+    if prev_bbox is None:
+        return curr_bbox
+    return [alpha * p + (1 - alpha) * c for p, c in zip(prev_bbox, curr_bbox)]
+def smooth_contact_mask(prev_mask, curr_mask, alpha=0.8):
+    if prev_mask is None:
+        return curr_mask.astype(np.float32)
+    return alpha * prev_mask + (1 - alpha) * curr_mask.astype(np.float32)
+def remove_small_contact_components(contact_mask, faces, min_size=20):
+    vertex_to_faces = defaultdict(list)
+    for i, f in enumerate(faces):
+        for v in f:
+            vertex_to_faces[v].append(i)
+    visited = np.zeros(len(contact_mask), dtype=bool)
+    filtered_mask = np.zeros_like(contact_mask, dtype=bool)
+    for v in range(len(contact_mask)):
+        if visited[v] or not contact_mask[v]:
+            continue
+        queue = deque([v])
+        component = []
+        while queue:
+            curr = queue.popleft()
+            if visited[curr] or not contact_mask[curr]:
+                continue
+            visited[curr] = True
+            component.append(curr)
+            for f_idx in vertex_to_faces[curr]:
+                for neighbor in faces[f_idx]:
+                    if not visited[neighbor] and contact_mask[neighbor]:
+                        queue.append(neighbor)
+        if len(component) >= min_size:
+            filtered_mask[component] = True
+    return filtered_mask
+def initialize_video_writer(output_path, fps, frame_size):
+    tried_codecs = ['avc1', 'H264', 'X264', 'MJPG', 'mp4v'] # we recommend using 'MJPG'
+    for codec in tried_codecs:
+        fourcc = cv2.VideoWriter_fourcc(*codec)
+        writer = cv2.VideoWriter(output_path, fourcc, fps, frame_size)
+        if writer.isOpened():
+            print(f"Using codec '{codec}' for {output_path}")
+            return writer
+        writer.release()
+    raise RuntimeError(f"Failed to initialize VideoWriter for {output_path}")
+def extract_frames_with_hand(cap, detector):
+    frames_with_hand = []
+    frame_idx = 0
+    while cap.isOpened():
+        ret, frame = cap.read()
+        if not ret:
+            break
+        orig_img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=orig_img)
+        detection_result = detector.detect(mp_image)
+        _, right_hand_bbox = draw_landmarks_on_image(orig_img.copy(), detection_result)
+        if right_hand_bbox is not None:
+            frames_with_hand.append((frame_idx, frame, right_hand_bbox))
+        frame_idx += 1
+    cap.release()
+    return frames_with_hand
+def find_longest_continuous_segment(frames_with_hand):
+    longest_segment = []
+    current_segment = []
+    for i in range(len(frames_with_hand)):
+        if i == 0 or frames_with_hand[i][0] == frames_with_hand[i - 1][0] + 1:
+            current_segment.append(frames_with_hand[i])
+        else:
+            if len(current_segment) > len(longest_segment):
+                longest_segment = current_segment
+            current_segment = [frames_with_hand[i]]
+    if len(current_segment) > len(longest_segment):
+        longest_segment = current_segment
+    return longest_segment

lib/utils/eval_utils.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import torch
+import numpy as np
+def evaluation(outputs, targets_data, meta_info, mode='val', thres=0.5):
+    eval_out = {}
+    # GT
+    mesh_valid = meta_info['mano_valid'] is not None
+    # Pred
+    contact_pred = outputs['contact_out'].sigmoid()[0].detach().cpu().numpy()
+    # Error Calculate
+    if mesh_valid:
+        # Contact Metrics
+        cont_pre, cont_rec, cont_f1 = compute_contact_metrics(targets_data['contact_data']['contact_h'][0].detach().cpu().numpy(), outputs['contact_out'][0].detach().cpu().numpy(), mesh_valid, thres=thres)
+        eval_out['cont_pre'] = cont_pre
+        eval_out['cont_rec'] = cont_rec
+        eval_out['cont_f1'] = cont_f1
+    return eval_out
+def compute_contact_metrics(gt, pred, valid, thres=0.5):
+    """
+    Compute precision, recall, and f1 using NumPy
+    """
+    if valid:
+        # True Positives
+        tp_num = np.sum(gt[pred >= thres])
+        # Denominators for precision and recall
+        precision_denominator = np.sum(pred >= thres)
+        recall_denominator = np.sum(gt)
+        # Compute precision, recall, and F1 score
+        precision_ = tp_num / precision_denominator if precision_denominator > 0 else None
+        recall_ = tp_num / recall_denominator if recall_denominator > 0 else None
+        if precision_ is not None and recall_ is not None and (precision_ + recall_) > 0:
+            f1_ = 2 * precision_ * recall_ / (precision_ + recall_)
+        else:
+            f1_ = None
+    else:
+        # If not valid, return None for metrics
+        precision_ = None
+        recall_ = None
+        f1_ = None
+    return precision_, recall_, f1_

lib/utils/func_utils.py ADDED Viewed

	@@ -0,0 +1,65 @@

+import cv2
+import torch
+import numpy as np
+def load_img(path, order='RGB'):
+    img = cv2.imread(path, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
+    if not isinstance(img, np.ndarray):
+        raise IOError("Fail to read %s" % path)
+    if order=='RGB': img = img[:,:,::-1]
+    img = img.astype(np.float32)
+    return img
+def get_bbox(joint_img, joint_valid, expansion_factor=1.0):
+    x_img, y_img = joint_img[:,0], joint_img[:,1]
+    x_img = x_img[joint_valid==1]; y_img = y_img[joint_valid==1];
+    xmin = min(x_img); ymin = min(y_img); xmax = max(x_img); ymax = max(y_img);
+    x_center = (xmin+xmax)/2.; width = (xmax-xmin)*expansion_factor;
+    xmin = x_center - 0.5*width
+    xmax = x_center + 0.5*width
+    y_center = (ymin+ymax)/2.; height = (ymax-ymin)*expansion_factor;
+    ymin = y_center - 0.5*height
+    ymax = y_center + 0.5*height
+    bbox = np.array([xmin, ymin, xmax - xmin, ymax - ymin]).astype(np.float32)
+    return bbox
+def process_bbox(bbox, target_shape, original_img_shape):
+    # aspect ratio preserving bbox
+    w = bbox[2]
+    h = bbox[3]
+    c_x = bbox[0] + w/2.
+    c_y = bbox[1] + h/2.
+    aspect_ratio = target_shape[1]/target_shape[0]
+    if w > aspect_ratio * h:
+        h = w / aspect_ratio
+    elif w < aspect_ratio * h:
+        w = h * aspect_ratio
+    bbox[2] = w*1.25
+    bbox[3] = h*1.25
+    bbox[0] = c_x - bbox[2]/2.
+    bbox[1] = c_y - bbox[3]/2.
+    return bbox
+import re
+def atoi(text):
+    return int(text) if text.isdigit() else text
+def natural_keys(text):
+    return [atoi(c) for c in re.split(r'(\d+)', text)]
+# Load config
+import yaml
+def load_config(cfg_path):
+    with open(cfg_path, 'r') as f:
+        cfg = yaml.safe_load(f)
+    return cfg

lib/utils/human_models.py ADDED Viewed

	@@ -0,0 +1,49 @@

+import numpy as np
+import torch
+import os.path as osp
+import pickle
+from lib.core.config import cfg
+from lib.utils.transforms import transform_joint_to_other_db
+from lib.utils.smplx import smplx
+class MANO(object):
+    def __init__(self):
+        self.layer_arg = {'create_global_orient': False, 'create_hand_pose': False, 'create_betas': False, 'create_transl': False}
+        self.layer = {'right': smplx.create(cfg.MODEL.human_model_path, 'mano', is_rhand=True, use_pca=False, flat_hand_mean=False, **self.layer_arg), 'left': smplx.create(cfg.MODEL.human_model_path, 'mano', is_rhand=False, use_pca=False, flat_hand_mean=False, **self.layer_arg)}
+        self.vertex_num = 778
+        self.face = {'right': self.layer['right'].faces, 'left': self.layer['left'].faces}
+        self.add_watertight_face = {'right': np.array([[92,38,122], [234,92,122], [239,234,122], [279,239,122], [215,279,122], [215,122,118], [215,118,117], [215,117,119], [215,119,120], [215,120,108], [215,108,79], [215,79,78], [215,78,121], [214,215,121]])}
+        self.watertight_face = {'right': np.concatenate((self.layer['right'].faces, self.add_watertight_face['right']), axis=0)}
+        self.shape_param_dim = 10
+        if torch.sum(torch.abs(self.layer['left'].shapedirs[:,0,:] - self.layer['right'].shapedirs[:,0,:])) < 1:
+            print('Fix shapedirs bug of MANO')
+            self.layer['left'].shapedirs[:,0,:] *= -1
+        # original MANO joint set
+        self.orig_joint_num = 16
+        self.orig_joints_name = ('Wrist', 'Index_1', 'Index_2', 'Index_3', 'Middle_1', 'Middle_2', 'Middle_3', 'Pinky_1', 'Pinky_2', 'Pinky_3', 'Ring_1', 'Ring_2', 'Ring_3', 'Thumb_1', 'Thumb_2', 'Thumb_3')
+        self.orig_root_joint_idx = self.orig_joints_name.index('Wrist')
+        self.orig_flip_pairs = ()
+        self.orig_joint_regressor = self.layer['right'].J_regressor.numpy() # same for the right and left hands
+        # changed MANO joint set
+        self.joint_num = 21 # manually added fingertips
+        self.joints_name = ('Wrist', 'Thumb_1', 'Thumb_2', 'Thumb_3', 'Thumb_4', 'Index_1', 'Index_2', 'Index_3', 'Index_4', 'Middle_1', 'Middle_2', 'Middle_3', 'Middle_4', 'Ring_1', 'Ring_2', 'Ring_3', 'Ring_4', 'Pinky_1', 'Pinky_2', 'Pinky_3', 'Pinky_4')
+        self.skeleton = ( (0,1), (0,5), (0,9), (0,13), (0,17), (1,2), (2,3), (3,4), (5,6), (6,7), (7,8), (9,10), (10,11), (11,12), (13,14), (14,15), (15,16), (17,18), (18,19), (19,20) )
+        self.root_joint_idx = self.joints_name.index('Wrist')
+        self.flip_pairs = ()
+        # add fingertips to joint_regressor
+        self.joint_regressor = transform_joint_to_other_db(self.orig_joint_regressor, self.orig_joints_name, self.joints_name)
+        self.joint_regressor[self.joints_name.index('Thumb_4')] = np.array([1 if i == 745 else 0 for i in range(self.joint_regressor.shape[1])], dtype=np.float32).reshape(1,-1)
+        self.joint_regressor[self.joints_name.index('Index_4')] = np.array([1 if i == 317 else 0 for i in range(self.joint_regressor.shape[1])], dtype=np.float32).reshape(1,-1)
+        self.joint_regressor[self.joints_name.index('Middle_4')] = np.array([1 if i == 445 else 0 for i in range(self.joint_regressor.shape[1])], dtype=np.float32).reshape(1,-1)
+        self.joint_regressor[self.joints_name.index('Ring_4')] = np.array([1 if i == 556 else 0 for i in range(self.joint_regressor.shape[1])], dtype=np.float32).reshape(1,-1)
+        self.joint_regressor[self.joints_name.index('Pinky_4')] = np.array([1 if i == 673 else 0 for i in range(self.joint_regressor.shape[1])], dtype=np.float32).reshape(1,-1)
+mano = MANO()

lib/utils/log_utils.py ADDED Viewed

	@@ -0,0 +1,12 @@

+import os
+import shutil
+from datetime import datetime
+from pytz import timezone
+def init_dirs(dir_list):
+    for dir in dir_list:
+        if os.path.exists(dir) and os.path.isdir(dir):
+            shutil.rmtree(dir)
+        os.makedirs(dir)

lib/utils/mano_utils.py ADDED Viewed

	@@ -0,0 +1,136 @@

+'''
+Copyright 2017 Javier Romero, Dimitrios Tzionas, Michael J Black and the Max Planck Gesellschaft.  All rights reserved.
+This software is provided for research purposes only.
+By using this software you agree to the terms of the MANO/SMPL+H Model license here http://mano.is.tue.mpg.de/license
+More information about MANO/SMPL+H is available at http://mano.is.tue.mpg.de.
+For comments or questions, please email us at: mano@tue.mpg.de
+About this file:
+================
+This file defines a wrapper for the loading functions of the MANO model.
+Modules included:
+- load_model:
+  loads the MANO model from a given file location (i.e. a .pkl file location),
+  or a dictionary object.
+'''
+import os
+import cv2
+import torch
+import numpy as np
+import pickle
+import chumpy as ch
+from chumpy.ch import MatVecMult
+class Rodrigues(ch.Ch):
+    dterms = 'rt'
+    def compute_r(self):
+        return cv2.Rodrigues(self.rt.r)[0]
+    def compute_dr_wrt(self, wrt):
+        if wrt is self.rt:
+            return cv2.Rodrigues(self.rt.r)[1].T
+def lrotmin(p):
+    if isinstance(p, np.ndarray):
+        p = p.ravel()[3:]
+        return np.concatenate(
+            [(cv2.Rodrigues(np.array(pp))[0] - np.eye(3)).ravel()
+             for pp in p.reshape((-1, 3))]).ravel()
+    if p.ndim != 2 or p.shape[1] != 3:
+        p = p.reshape((-1, 3))
+    p = p[1:]
+    return ch.concatenate([(Rodrigues(pp) - ch.eye(3)).ravel()
+                           for pp in p]).ravel()
+def posemap(s):
+    if s == 'lrotmin':
+        return lrotmin
+    else:
+        raise Exception('Unknown posemapping: %s' % (str(s), ))
+def ready_arguments(fname_or_dict, posekey4vposed='pose'):
+    if not isinstance(fname_or_dict, dict):
+        dd = pickle.load(open(fname_or_dict, 'rb'), encoding='latin1')
+    else:
+        dd = fname_or_dict
+    want_shapemodel = 'shapedirs' in dd
+    nposeparms = dd['kintree_table'].shape[1] * 3
+    if 'trans' not in dd:
+        dd['trans'] = np.zeros(3)
+    if 'pose' not in dd:
+        dd['pose'] = np.zeros(nposeparms)
+    if 'shapedirs' in dd and 'betas' not in dd:
+        dd['betas'] = np.zeros(dd['shapedirs'].shape[-1])
+    for s in [
+            'v_template', 'weights', 'posedirs', 'pose', 'trans', 'shapedirs',
+            'betas', 'J'
+    ]:
+        if (s in dd) and not hasattr(dd[s], 'dterms'):
+            dd[s] = ch.array(dd[s])
+    assert (posekey4vposed in dd)
+    if want_shapemodel:
+        dd['v_shaped'] = dd['shapedirs'].dot(dd['betas']) + dd['v_template']
+        v_shaped = dd['v_shaped']
+        J_tmpx = MatVecMult(dd['J_regressor'], v_shaped[:, 0])
+        J_tmpy = MatVecMult(dd['J_regressor'], v_shaped[:, 1])
+        J_tmpz = MatVecMult(dd['J_regressor'], v_shaped[:, 2])
+        dd['J'] = ch.vstack((J_tmpx, J_tmpy, J_tmpz)).T
+        pose_map_res = posemap(dd['bs_type'])(dd[posekey4vposed])
+        dd['v_posed'] = v_shaped + dd['posedirs'].dot(pose_map_res)
+    else:
+        pose_map_res = posemap(dd['bs_type'])(dd[posekey4vposed])
+        dd_add = dd['posedirs'].dot(pose_map_res)
+        dd['v_posed'] = dd['v_template'] + dd_add
+    return dd
+def get_mano_pca_basis(ncomps=45, use_pca=True, side='right', mano_root='data/base_data/human_models/mano'):
+    if use_pca:
+        ncomps = ncomps
+    else:
+        ncomps = 45
+    if side == 'right':
+        mano_path = os.path.join(mano_root, 'MANO_RIGHT.pkl')
+    elif side == 'left':
+        mano_path = os.path.join(mano_root, 'MANO_LEFT.pkl')
+    smpl_data = ready_arguments(mano_path)
+    hands_components = smpl_data['hands_components']
+    selected_components = hands_components[:ncomps]
+    th_selected_comps = selected_components
+    return torch.tensor(th_selected_comps, dtype=torch.float32)
+def change_flat_hand_mean(hand_pose, remove=True, side='right', mano_root='data/base_data/human_models/mano'):
+    if side == 'right':
+        mano_path = os.path.join(mano_root, 'MANO_RIGHT.pkl')
+    elif side == 'left':
+        mano_path = os.path.join(mano_root, 'MANO_LEFT.pkl')
+    smpl_data = ready_arguments(mano_path)
+    # Get hand mean
+    hands_mean = smpl_data['hands_mean']
+    hands_mean = hands_mean.copy() # hands_mean: (45)
+    if remove:
+        hand_pose[3:] = hand_pose[3:] - hands_mean
+    else:
+        hand_pose[3:] = hand_pose[3:] + hands_mean
+    return hand_pose

lib/utils/mesh_utils.py ADDED Viewed

	@@ -0,0 +1,74 @@

+import torch
+import trimesh
+import numpy as np
+from plyfile import PlyData, PlyElement
+def center_vertices(vertices, faces, flip_y=True): # This is for MOW dataset
+    """Centroid-align vertices."""
+    vertices = vertices - np.mean(vertices, axis=0, keepdims=True)
+    if flip_y:
+        vertices[:, 1] *= -1
+        faces = faces[:, [2, 1, 0]]
+    return vertices, faces
+def load_obj_nr(filename_obj, normalization=True, texture_size=4, load_texture=False, # load_obj function from neural_renderer (https://github.com/daniilidis-group/neural_renderer) and MOW (https://github.com/ZheC/MOW)
+             texture_wrapping='REPEAT', use_bilinear=True):
+    """
+    Load Wavefront .obj file.
+    This function only supports vertices (v x x x) and faces (f x x x).
+    """
+    # load vertices
+    vertices = []
+    with open(filename_obj) as f:
+        lines = f.readlines()
+    for line in lines:
+        if len(line.split()) == 0:
+            continue
+        if line.split()[0] == 'v':
+            vertices.append([float(v) for v in line.split()[1:4]])
+    vertices = torch.from_numpy(np.vstack(vertices).astype(np.float32))
+    # load faces
+    faces = []
+    for line in lines:
+        if len(line.split()) == 0:
+            continue
+        if line.split()[0] == 'f':
+            vs = line.split()[1:]
+            nv = len(vs)
+            v0 = int(vs[0].split('/')[0])
+            for i in range(nv - 2):
+                v1 = int(vs[i + 1].split('/')[0])
+                v2 = int(vs[i + 2].split('/')[0])
+                faces.append((v0, v1, v2))
+    faces = torch.from_numpy(np.vstack(faces).astype(np.int32)) - 1
+    # load textures
+    textures = None
+    if load_texture:
+        for line in lines:
+            if line.startswith('mtllib'):
+                filename_mtl = os.path.join(os.path.dirname(filename_obj), line.split()[1])
+                textures = load_textures(filename_obj, filename_mtl, texture_size,
+                                         texture_wrapping=texture_wrapping,
+                                         use_bilinear=use_bilinear)
+        if textures is None:
+            raise Exception('Failed to load textures.')
+    # normalize into a unit cube centered zero
+    if normalization:
+        vertices -= vertices.min(0)[0][None, :]
+        vertices /= torch.abs(vertices).max()
+        vertices *= 2
+        vertices -= vertices.max(0)[0][None, :] / 2
+    if load_texture:
+        return vertices, faces, textures
+    else:
+        return vertices, faces

lib/utils/preprocessing.py ADDED Viewed

	@@ -0,0 +1,330 @@

+import cv2
+import torch
+import random
+import numpy as np
+import torch.nn.functional as F
+from lib.core.config import cfg
+from lib.utils.human_models import mano
+def get_aug_config_contact():
+    # Augmentation intensity factors
+    scale_factor = 0.25
+    rot_factor = 30
+    color_factor = 0.2
+    trans_factor = 0.1 # Translation range (recommended 0.1 to 0.2)
+    noise_std = 0.02 # Gaussian noise strength
+    motion_blur_prob = 0.15 # Probability of applying motion blur
+    extreme_crop_prob = 0.1 # Probability for extreme cropping
+    extreme_crop_lvl = 0.3 # Crop intensity (recommended 0.2 to 0.4)
+    low_res_prob = 0.05 # Probability for applying low resolution
+    low_res_scale_range = (0.15, 0.5) # Range for low-res scaling
+    # Scaling augmentation
+    scale = np.clip(np.random.randn(), -1.0, 1.0) * scale_factor + 1.0
+    # Rotation augmentation
+    rot = np.clip(np.random.randn(), -2.0, 2.0) * rot_factor if random.random() <= 0.6 else 0
+    # Color augmentation
+    c_up = 1.0 + color_factor
+    c_low = 1.0 - color_factor
+    color_scale = np.array([
+        random.uniform(c_low, c_up),
+        random.uniform(c_low, c_up),
+        random.uniform(c_low, c_up)
+    ])
+    # Flipping augmentation
+    do_flip = random.random() <= 0.5
+    # Translation augmentation
+    tx = np.clip(np.random.randn(), -1.0, 1.0) * trans_factor
+    ty = np.clip(np.random.randn(), -1.0, 1.0) * trans_factor
+    # Extreme cropping augmentation
+    do_extreme_crop = random.random() <= extreme_crop_prob
+    # Noise augmentation (returns standard deviation for Gaussian noise injection)
+    add_noise = random.random() <= 0.3  # 30% chance of adding noise
+    noise_std = noise_std if add_noise else 0.0
+    # Motion blur augmentation
+    apply_motion_blur = random.random() <= motion_blur_prob
+    motion_blur_kernel_size = random.choice([3, 5, 7]) if apply_motion_blur else 0
+    # Low-resolution augmentation
+    apply_low_res = random.random() <= low_res_prob
+    low_res_scale = random.uniform(*low_res_scale_range) if apply_low_res else 1.0
+    return {
+        'scale': scale,
+        'rot': rot,
+        'color_scale': color_scale,
+        'do_flip': do_flip,
+        'tx': tx,
+        'ty': ty,
+        'do_extreme_crop': do_extreme_crop,
+        'extreme_crop_lvl': extreme_crop_lvl if do_extreme_crop else 0,
+        'noise_std': noise_std,
+        'motion_blur_kernel_size': motion_blur_kernel_size,
+        'low_res_scale': low_res_scale # Added low-res scale parameter
+    }
+def rotate_2d(pt_2d, rot_rad):
+    x = pt_2d[0]
+    y = pt_2d[1]
+    sn, cs = np.sin(rot_rad), np.cos(rot_rad)
+    xx = x * cs - y * sn
+    yy = x * sn + y * cs
+    return np.array([xx, yy], dtype=np.float32)
+def gen_trans_from_patch_cv(c_x, c_y, src_width, src_height, dst_width, dst_height, scale, rot, inv=False):
+    # augment size with scale
+    src_w = src_width * scale
+    src_h = src_height * scale
+    src_center = np.array([c_x, c_y], dtype=np.float32)
+    # augment rotation
+    rot_rad = np.pi * rot / 180
+    src_downdir = rotate_2d(np.array([0, src_h * 0.5], dtype=np.float32), rot_rad)
+    src_rightdir = rotate_2d(np.array([src_w * 0.5, 0], dtype=np.float32), rot_rad)
+    dst_w = dst_width
+    dst_h = dst_height
+    dst_center = np.array([dst_w * 0.5, dst_h * 0.5], dtype=np.float32)
+    dst_downdir = np.array([0, dst_h * 0.5], dtype=np.float32)
+    dst_rightdir = np.array([dst_w * 0.5, 0], dtype=np.float32)
+    src = np.zeros((3, 2), dtype=np.float32)
+    src[0, :] = src_center
+    src[1, :] = src_center + src_downdir
+    src[2, :] = src_center + src_rightdir
+    dst = np.zeros((3, 2), dtype=np.float32)
+    dst[0, :] = dst_center
+    dst[1, :] = dst_center + dst_downdir
+    dst[2, :] = dst_center + dst_rightdir
+    if inv:
+        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
+    else:
+        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
+    trans = trans.astype(np.float32)
+    return trans
+def generate_patch_image_contact(cvimg, bbox, scale, rot, do_flip, out_shape, tx=0.0, ty=0.0, bkg_color='black'):
+    img = cvimg.copy()
+    img_height, img_width, img_channels = img.shape
+    bb_c_x = float(bbox[0] + 0.5 * bbox[2])
+    bb_c_y = float(bbox[1] + 0.5 * bbox[3])
+    bb_width = float(bbox[2])
+    bb_height = float(bbox[3])
+    if bkg_color == 'white':
+        borderMode=cv2.BORDER_CONSTANT
+        borderValue=(255, 255, 255)
+    else:
+        borderMode=cv2.BORDER_CONSTANT
+        borderValue=(0, 0, 0)
+    if do_flip:
+        img = img[:, ::-1, :]
+        bb_c_x = img_width - bb_c_x - 1
+    # Add translation offset
+    bb_c_x += tx * img_width
+    bb_c_y += ty * img_height
+    trans = gen_trans_from_patch_cv(bb_c_x, bb_c_y, bb_width, bb_height,
+                                    out_shape[1], out_shape[0], scale, rot)
+    img_patch = cv2.warpAffine(img, trans, (int(out_shape[1]), int(out_shape[0])), flags=cv2.INTER_LINEAR, borderMode=borderMode, borderValue=borderValue)
+    img_patch = img_patch.astype(np.float32)
+    inv_trans = gen_trans_from_patch_cv(bb_c_x, bb_c_y, bb_width, bb_height,
+                                        out_shape[1], out_shape[0], scale, rot, inv=True)
+    return img_patch, trans, inv_trans
+def augmentation_contact(img, bbox, data_split, enforce_flip=None, bkg_color='black'):
+    if data_split == 'train':
+        aug_params = get_aug_config_contact()
+    else:
+        aug_params = {
+            'scale': 1.0,
+            'rot': 0.0,
+            'color_scale': np.array([1, 1, 1]),
+            'do_flip': False,
+            'tx': 0.0,
+            'ty': 0.0,
+            'do_extreme_crop': False,
+            'extreme_crop_lvl': 0.0,
+            'noise_std': 0.0,
+            'motion_blur_kernel_size': 0,
+            'low_res_scale': 1.0  # No low-res in non-training mode
+        }
+    # Enforce flip if specified
+    if enforce_flip is not None:
+        aug_params['do_flip'] = enforce_flip
+    # Apply geometric augmentations (scaling, rotation, flipping)
+    img, trans, inv_trans = generate_patch_image_contact(
+        img, bbox, aug_params['scale'], aug_params['rot'],
+        aug_params['do_flip'], cfg.MODEL.input_img_shape,
+        aug_params['tx'], aug_params['ty'], bkg_color
+    )
+    # Apply low-resolution augmentation
+    if aug_params['low_res_scale'] < 1.0:  # Only apply if scaling down
+        img = apply_low_res(img, aug_params['low_res_scale'])
+    # Apply color augmentation
+    img = np.clip(img * aug_params['color_scale'][None, None, :], 0, 255)
+    # Apply extreme cropping
+    if aug_params['do_extreme_crop']:
+        img = apply_extreme_crop(img, aug_params['extreme_crop_lvl'])
+    # Apply noise augmentation
+    if aug_params['noise_std'] > 0:
+        img = add_gaussian_noise(img, aug_params['noise_std'])
+    # Apply motion blur augmentation
+    if aug_params['motion_blur_kernel_size'] > 0:
+        img = apply_motion_blur(img, aug_params['motion_blur_kernel_size'])
+    return img, trans, inv_trans, aug_params['rot'], aug_params['do_flip'], aug_params['color_scale']
+def apply_extreme_crop(img, crop_lvl):
+    """Extreme cropping: Aggressively crop the image."""
+    h, w = img.shape[:2]
+    crop_size = max(1, int(min(h, w) * (1 - crop_lvl)))  # Prevent zero-size crops
+    start_x = random.randint(0, max(0, w - crop_size))
+    start_y = random.randint(0, max(0, h - crop_size))
+    cropped_img = img[start_y:start_y + crop_size, start_x:start_x + crop_size]
+    # Preserve aspect ratio during resizing
+    return cv2.resize(cropped_img, (w, h), interpolation=cv2.INTER_LINEAR)
+def add_gaussian_noise(img, noise_std):
+    """Add Gaussian noise to the image with proper scaling for data type."""
+    noise = np.random.normal(0, noise_std, img.shape).astype(np.float32)
+    if img.dtype == np.uint8:
+        noisy_img = np.clip(img + noise * 255, 0, 255).astype(np.uint8)
+    elif img.dtype == np.float32:
+        noisy_img = np.clip(img + noise, 0.0, 1.0).astype(np.float32)
+    elif img.dtype == np.float64:
+        noisy_img = np.clip(img + noise, 0.0, 1.0).astype(np.float64)
+    else:
+        raise TypeError("Unsupported image dtype. Expected uint8 or float32.")
+    return noisy_img
+def apply_motion_blur(img, kernel_size):
+    """Apply motion blur to the image with a random direction."""
+    kernel = np.zeros((kernel_size, kernel_size))
+    direction = random.choice(['horizontal', 'vertical', 'diagonal'])
+    if direction == 'horizontal':
+        kernel[(kernel_size - 1) // 2, :] = np.ones(kernel_size)
+    elif direction == 'vertical':
+        kernel[:, (kernel_size - 1) // 2] = np.ones(kernel_size)
+    elif direction == 'diagonal':
+        np.fill_diagonal(kernel, 1)
+    kernel /= kernel_size  # Normalize the kernel
+    return cv2.filter2D(img, -1, kernel, borderType=cv2.BORDER_REFLECT)
+def apply_low_res(img, scale_factor=0.25):
+    """Simulate low-resolution effect by downsampling and upsampling."""
+    if not (0 < scale_factor < 1):
+        raise ValueError("scale_factor should be between 0 and 1.")
+    h, w = img.shape[:2]
+    # Calculate target dimensions for downsampling
+    downsampled_size = (max(1, int(w * scale_factor)), max(1, int(h * scale_factor)))
+    # Downsample using INTER_AREA for better quality in aggressive downsampling
+    low_res_img = cv2.resize(img, downsampled_size, interpolation=cv2.INTER_AREA)
+    # Upsample using INTER_NEAREST for strong pixelation effect
+    return cv2.resize(low_res_img, (w, h), interpolation=cv2.INTER_NEAREST).astype(img.dtype)
+def process_human_model_output_orig(human_model_param, cam_param):
+    pose, shape, trans = human_model_param['pose'], human_model_param['shape'], human_model_param['trans']
+    hand_type = human_model_param['hand_type']
+    trans = human_model_param['trans']
+    pose = torch.FloatTensor(pose).view(-1,3); shape = torch.FloatTensor(shape).view(1,-1); # mano parameters (pose: 48 dimension, shape: 10 dimension)
+    trans = torch.FloatTensor(trans).view(1,-1) # translation vector
+    # apply camera extrinsic (rotation)
+    # merge root pose and camera rotation
+    if 'R' in cam_param:
+        R = np.array(cam_param['R'], dtype=np.float32).reshape(3,3)
+        root_pose = pose[mano.orig_root_joint_idx,:].numpy()
+        root_pose, _ = cv2.Rodrigues(root_pose)
+        root_pose, _ = cv2.Rodrigues(np.dot(R,root_pose))
+        pose[mano.orig_root_joint_idx] = torch.from_numpy(root_pose).view(3)
+    # get root joint coordinate
+    root_pose = pose[mano.orig_root_joint_idx].view(1,3)
+    hand_pose = torch.cat((pose[:mano.orig_root_joint_idx,:], pose[mano.orig_root_joint_idx+1:,:])).view(1,-1)
+    with torch.no_grad():
+        output = mano.layer[hand_type](betas=shape, hand_pose=hand_pose, global_orient=root_pose, transl=trans)
+    mesh_coord = output.vertices[0].numpy()
+    joint_coord = np.dot(mano.joint_regressor, mesh_coord)
+    # apply camera exrinsic (translation)
+    # compenstate rotation (translation from origin to root joint was not cancled)
+    if 'R' in cam_param and 't' in cam_param:
+        R, t = np.array(cam_param['R'], dtype=np.float32).reshape(3,3), np.array(cam_param['t'], dtype=np.float32).reshape(1,3)
+        root_coord = joint_coord[mano.root_joint_idx,None,:]
+        joint_coord = joint_coord - root_coord + np.dot(R, root_coord.transpose(1,0)).transpose(1,0) + t
+        mesh_coord = mesh_coord - root_coord + np.dot(R, root_coord.transpose(1,0)).transpose(1,0) + t
+    joint_cam_orig = joint_coord.copy()
+    mesh_cam_orig = mesh_coord.copy()
+    pose_orig, shape_orig, trans_orig = torch.cat((root_pose, hand_pose), dim=-1)[0].detach().cpu().numpy(), shape[0].detach().cpu().numpy(), trans[0].detach().cpu().numpy()
+    return mesh_cam_orig, joint_cam_orig, pose_orig, shape_orig, trans_orig
+def mask2bbox(mask, expansion_factor=1.0):
+    # Find non-zero elements (object pixels)
+    coords = np.argwhere(mask)
+    # Extract bounding box coordinates
+    y_min, x_min = coords.min(axis=0)
+    y_max, x_max = coords.max(axis=0)
+    # Compute width and height
+    width = x_max - x_min + 1
+    height = y_max - y_min + 1
+    # Expand bounding box
+    if expansion_factor > 0:
+        x_min = max(0, int(x_min - width * expansion_factor / 2))
+        y_min = max(0, int(y_min - height * expansion_factor / 2))
+        x_max = min(mask.shape[1] - 1, int(x_max + width * expansion_factor / 2))
+        y_max = min(mask.shape[0] - 1, int(y_max + height * expansion_factor / 2))
+        # Recalculate width and height after expansion
+        width = x_max - x_min + 1
+        height = y_max - y_min + 1
+    return (x_min, y_min, width, height)

lib/utils/smplx/LICENSE ADDED Viewed

	@@ -0,0 +1,58 @@

+License
+Software Copyright License for non-commercial scientific research purposes
+Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use the SMPL-X/SMPLify-X model, data and software, (the "Model & Software"), including 3D meshes, blend weights, blend shapes, textures, software, scripts, and animations. By downloading and/or using the Model & Software (including downloading, cloning, installing, and any other use of this github repository), you acknowledge that you have read these terms and conditions, understand them, and agree to be bound by them. If you do not agree with these terms and conditions, you must not download and/or use the Model & Software. Any infringement of the terms of this agreement will automatically terminate your rights under this License
+Ownership / Licensees
+The Software and the associated materials has been developed at the
+Max Planck Institute for Intelligent Systems (hereinafter "MPI").
+Any copyright or patent right is owned by and proprietary material of the
+Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (hereinafter “MPG”; MPI and MPG hereinafter collectively “Max-Planck”)
+hereinafter the “Licensor”.
+License Grant
+Licensor grants you (Licensee) personally a single-user, non-exclusive, non-transferable, free of charge right:
+To install the Model & Software on computers owned, leased or otherwise controlled by you and/or your organization;
+To use the Model & Software for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects;
+Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artifacts for commercial purposes. The Model & Software may not be reproduced, modified and/or made available in any form to any third party without Max-Planck’s prior written permission.
+The Model & Software may not be used for pornographic purposes or to generate pornographic material whether commercial or not. This license also prohibits the use of the Model & Software to train methods/algorithms/neural networks/etc. for commercial use of any kind. By downloading the Model & Software, you agree not to reverse engineer it.
+No Distribution
+The Model & Software and the license herein granted shall not be copied, shared, distributed, re-sold, offered for re-sale, transferred or sub-licensed in whole or in part except that you may make one copy for archive purposes only.
+Disclaimer of Representations and Warranties
+You expressly acknowledge and agree that the Model & Software results from basic research, is provided “AS IS”, may contain errors, and that any use of the Model & Software is at your sole risk. LICENSOR MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE MODEL & SOFTWARE, NEITHER EXPRESS NOR IMPLIED, AND THE ABSENCE OF ANY LEGAL OR ACTUAL DEFECTS, WHETHER DISCOVERABLE OR NOT. Specifically, and not to limit the foregoing, licensor makes no representations or warranties (i) regarding the merchantability or fitness for a particular purpose of the Model & Software, (ii) that the use of the Model & Software will not infringe any patents, copyrights or other intellectual property rights of a third party, and (iii) that the use of the Model & Software will not cause any damage of any kind to you or a third party.
+Limitation of Liability
+Because this Model & Software License Agreement qualifies as a donation, according to Section 521 of the German Civil Code (Bürgerliches Gesetzbuch – BGB) Licensor as a donor is liable for intent and gross negligence only. If the Licensor fraudulently conceals a legal or material defect, they are obliged to compensate the Licensee for the resulting damage.
+Licensor shall be liable for loss of data only up to the amount of typical recovery costs which would have arisen had proper and regular data backup measures been taken. For the avoidance of doubt Licensor shall be liable in accordance with the German Product Liability Act in the event of product liability. The foregoing applies also to Licensor’s legal representatives or assistants in performance. Any further liability shall be excluded.
+Patent claims generated through the usage of the Model & Software cannot be directed towards the copyright holders.
+The Model & Software is provided in the state of development the licensor defines. If modified or extended by Licensee, the Licensor makes no claims about the fitness of the Model & Software and is not responsible for any problems such modifications cause.
+No Maintenance Services
+You understand and agree that Licensor is under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Model & Software. Licensor nevertheless reserves the right to update, modify, or discontinue the Model & Software at any time.
+Defects of the Model & Software must be notified in writing to the Licensor with a comprehensible description of the error symptoms. The notification of the defect should enable the reproduction of the error. The Licensee is encouraged to communicate any use, results, modification or publication.
+Publications using the Model & Software
+You acknowledge that the Model & Software is a valuable scientific resource and agree to appropriately reference the following paper in any publication making use of the Model & Software.
+Citation:
+@inproceedings{SMPL-X:2019,
+  title = {Expressive Body Capture: 3D Hands, Face, and Body from a Single Image},
+  author = {Pavlakos, Georgios and Choutas, Vasileios and Ghorbani, Nima and Bolkart, Timo and Osman, Ahmed A. A. and Tzionas, Dimitrios and Black, Michael J.},
+  booktitle = {Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
+  year = {2019}
+}
+Commercial licensing opportunities
+For commercial uses of the Software, please send email to ps-license@tue.mpg.de
+This Agreement shall be governed by the laws of the Federal Republic of Germany except for the UN Sales Convention.

lib/utils/smplx/README.md ADDED Viewed

	@@ -0,0 +1,186 @@

+## SMPL-X:  A new joint 3D model of the human body, face and hands together
+[[Paper Page](https://smpl-x.is.tue.mpg.de)] [[Paper](https://ps.is.tuebingen.mpg.de/uploads_file/attachment/attachment/497/SMPL-X.pdf)]
+[[Supp. Mat.](https://ps.is.tuebingen.mpg.de/uploads_file/attachment/attachment/498/SMPL-X-supp.pdf)]
+![SMPL-X Examples](./images/teaser_fig.png)
+## Table of Contents
+  * [License](#license)
+  * [Description](#description)
+  * [Installation](#installation)
+  * [Downloading the model](#downloading-the-model)
+  * [Loading SMPL-X, SMPL+H and SMPL](#loading-smpl-x-smplh-and-smpl)
+    * [SMPL and SMPL+H setup](#smpl-and-smplh-setup)
+    * [Model loading](https://github.com/vchoutas/smplx#model-loading)
+  * [MANO and FLAME correspondences](#mano-and-flame-correspondences)
+  * [Example](#example)
+  * [Citation](#citation)
+  * [Acknowledgments](#acknowledgments)
+  * [Contact](#contact)
+## License
+Software Copyright License for **non-commercial scientific research purposes**.
+Please read carefully the [terms and conditions](https://github.com/vchoutas/smplx/blob/master/LICENSE) and any accompanying documentation before you download and/or use the SMPL-X/SMPLify-X model, data and software, (the "Model & Software"), including 3D meshes, blend weights, blend shapes, textures, software, scripts, and animations. By downloading and/or using the Model & Software (including downloading, cloning, installing, and any other use of this github repository), you acknowledge that you have read these terms and conditions, understand them, and agree to be bound by them. If you do not agree with these terms and conditions, you must not download and/or use the Model & Software. Any infringement of the terms of this agreement will automatically terminate your rights under this [License](./LICENSE).
+## Disclaimer
+The original images used for the figures 1 and 2 of the paper can be found in this link.
+The images in the paper are used under license from gettyimages.com.
+We have acquired the right to use them in the publication, but redistribution is not allowed.
+Please follow the instructions on the given link to acquire right of usage.
+Our results are obtained on the 483 × 724 pixels resolution of the original images.
+## Description
+*SMPL-X* (SMPL eXpressive) is a unified body model with shape parameters trained jointly for the
+face, hands and body. *SMPL-X* uses standard vertex based linear blend skinning with learned corrective blend
+shapes, has N = 10, 475 vertices and K = 54 joints,
+which include joints for the neck, jaw, eyeballs and fingers.
+SMPL-X is defined by a function M(θ, β, ψ), where θ is the pose parameters, β the shape parameters and
+ψ the facial expression parameters.
+## Installation
+To install the model please follow the next steps in the specified order:
+1. To install from PyPi simply run:
+  ```Shell
+  pip install smplx[all]
+  ```
+2. Clone this repository and install it using the *setup.py* script:
+```Shell
+git clone https://github.com/vchoutas/smplx
+python setup.py install
+```
+## Downloading the model
+To download the *SMPL-X* model go to [this project website](https://smpl-x.is.tue.mpg.de) and register to get access to the downloads section.
+To download the *SMPL+H* model go to [this project website](http://mano.is.tue.mpg.de) and register to get access to the downloads section.
+To download the *SMPL* model go to [this](http://smpl.is.tue.mpg.de) (male and female models) and [this](http://smplify.is.tue.mpg.de) (gender neutral model) project website and register to get access to the downloads section.
+## Loading SMPL-X, SMPL+H and SMPL
+### SMPL and SMPL+H setup
+The loader gives the option to use any of the SMPL-X, SMPL+H, SMPL, and MANO models. Depending on the model you want to use, please follow the respective download instructions. To switch between MANO, SMPL, SMPL+H and SMPL-X just change the *model_path* or *model_type* parameters. For more details please check the docs of the model classes.
+Before using SMPL and SMPL+H you should follow the instructions in [tools/README.md](./tools/README.md) to remove the
+Chumpy objects from both model pkls, as well as merge the MANO parameters with SMPL+H.
+### Model loading
+You can either use the [create](https://github.com/vchoutas/smplx/blob/c63c02b478c5c6f696491ed9167e3af6b08d89b1/smplx/body_models.py#L54)
+function from [body_models](./smplx/body_models.py) or directly call the constructor for the
+[SMPL](https://github.com/vchoutas/smplx/blob/c63c02b478c5c6f696491ed9167e3af6b08d89b1/smplx/body_models.py#L106),
+[SMPL+H](https://github.com/vchoutas/smplx/blob/c63c02b478c5c6f696491ed9167e3af6b08d89b1/smplx/body_models.py#L395) and
+[SMPL-X](https://github.com/vchoutas/smplx/blob/c63c02b478c5c6f696491ed9167e3af6b08d89b1/smplx/body_models.py#L628) model. The path to the model can either be the path to the file with the parameters or a directory with the following structure:
+```bash
+models
+├── smpl
+│   ├── SMPL_FEMALE.pkl
+│   ��── SMPL_MALE.pkl
+│   └── SMPL_NEUTRAL.pkl
+├── smplh
+│   ├── SMPLH_FEMALE.pkl
+│   └── SMPLH_MALE.pkl
+├── mano
+|   ├── MANO_RIGHT.pkl
+|   └── MANO_LEFT.pkl
+└── smplx
+    ├── SMPLX_FEMALE.npz
+    ├── SMPLX_FEMALE.pkl
+    ├── SMPLX_MALE.npz
+    ├── SMPLX_MALE.pkl
+    ├── SMPLX_NEUTRAL.npz
+    └── SMPLX_NEUTRAL.pkl
+```
+## MANO and FLAME correspondences
+The vertex correspondences between SMPL-X and MANO, FLAME can be downloaded
+from [the project website](https://smpl-x.is.tue.mpg.de). If you have extracted
+the correspondence data in the folder *correspondences*, then use the following
+scripts to visualize them:
+1. To view MANO correspondences run the following command:
+```
+python examples/vis_mano_vertices.py --model-folder $SMPLX_FOLDER --corr-fname correspondences/MANO_SMPLX_vertex_ids.pkl
+```
+2. To view FLAME correspondences run the following command:
+```
+python examples/vis_flame_vertices.py --model-folder $SMPLX_FOLDER --corr-fname correspondences/SMPL-X__FLAME_vertex_ids.npy
+```
+## Example
+After installing the *smplx* package and downloading the model parameters you should be able to run the *demo.py*
+script to visualize the results. For this step you have to install the [pyrender](https://pyrender.readthedocs.io/en/latest/index.html) and [trimesh](https://trimsh.org/) packages.
+`python examples/demo.py --model-folder $SMPLX_FOLDER --plot-joints=True --gender="neutral"`
+![SMPL-X Examples](./images/example.png)
+## Citation
+Depending on which model is loaded for your project, i.e. SMPL-X or SMPL+H or SMPL, please cite the most relevant work below, listed in the same order:
+```
+@inproceedings{SMPL-X:2019,
+    title = {Expressive Body Capture: 3D Hands, Face, and Body from a Single Image},
+    author = {Pavlakos, Georgios and Choutas, Vasileios and Ghorbani, Nima and Bolkart, Timo and Osman, Ahmed A. A. and Tzionas, Dimitrios and Black, Michael J.},
+    booktitle = {Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
+    year = {2019}
+}
+```
+```
+@article{MANO:SIGGRAPHASIA:2017,
+    title = {Embodied Hands: Modeling and Capturing Hands and Bodies Together},
+    author = {Romero, Javier and Tzionas, Dimitrios and Black, Michael J.},
+    journal = {ACM Transactions on Graphics, (Proc. SIGGRAPH Asia)},
+    volume = {36},
+    number = {6},
+    series = {245:1--245:17},
+    month = nov,
+    year = {2017},
+    month_numeric = {11}
+  }
+```
+```
+@article{SMPL:2015,
+    author = {Loper, Matthew and Mahmood, Naureen and Romero, Javier and Pons-Moll, Gerard and Black, Michael J.},
+    title = {{SMPL}: A Skinned Multi-Person Linear Model},
+    journal = {ACM Transactions on Graphics, (Proc. SIGGRAPH Asia)},
+    month = oct,
+    number = {6},
+    pages = {248:1--248:16},
+    publisher = {ACM},
+    volume = {34},
+    year = {2015}
+}
+```
+This repository was originally developed for SMPL-X / SMPLify-X (CVPR 2019), you might be interested in having a look: [https://smpl-x.is.tue.mpg.de](https://smpl-x.is.tue.mpg.de).
+## Acknowledgments
+### Facial Contour
+Special thanks to [Soubhik Sanyal](https://github.com/soubhiksanyal) for sharing the Tensorflow code used for the facial
+landmarks.
+## Contact
+The code of this repository was implemented by [Vassilis Choutas](vassilis.choutas@tuebingen.mpg.de).
+For questions, please contact [smplx@tue.mpg.de](smplx@tue.mpg.de).
+For commercial licensing (and all related questions for business applications), please contact [ps-licensing@tue.mpg.de](ps-licensing@tue.mpg.de).

lib/utils/smplx/examples/demo.py ADDED Viewed

	@@ -0,0 +1,180 @@

+# -*- coding: utf-8 -*-
+# Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (MPG) is
+# holder of all proprietary rights on this computer program.
+# You can only use this computer program if you have closed
+# a license agreement with MPG or you get the right to use the computer
+# program from someone who is authorized to grant you that right.
+# Any use of the computer program without a valid license is prohibited and
+# liable to prosecution.
+#
+# Copyright©2019 Max-Planck-Gesellschaft zur Förderung
+# der Wissenschaften e.V. (MPG). acting on behalf of its Max Planck Institute
+# for Intelligent Systems. All rights reserved.
+#
+# Contact: ps-license@tuebingen.mpg.de
+import os.path as osp
+import argparse
+import numpy as np
+import torch
+import smplx
+def main(model_folder,
+         model_type='smplx',
+         ext='npz',
+         gender='neutral',
+         plot_joints=False,
+         num_betas=10,
+         sample_shape=True,
+         sample_expression=True,
+         num_expression_coeffs=10,
+         plotting_module='pyrender',
+         use_face_contour=False):
+    model = smplx.create(model_folder, model_type=model_type,
+                         gender=gender, use_face_contour=use_face_contour,
+                         num_betas=num_betas,
+                         num_expression_coeffs=num_expression_coeffs,
+                         ext=ext)
+    print(model)
+    betas, expression = None, None
+    if sample_shape:
+        betas = torch.randn([1, model.num_betas], dtype=torch.float32)
+    if sample_expression:
+        expression = torch.randn(
+            [1, model.num_expression_coeffs], dtype=torch.float32)
+    output = model(betas=betas, expression=expression,
+                   return_verts=True)
+    vertices = output.vertices.detach().cpu().numpy().squeeze()
+    joints = output.joints.detach().cpu().numpy().squeeze()
+    print('Vertices shape =', vertices.shape)
+    print('Joints shape =', joints.shape)
+    if plotting_module == 'pyrender':
+        import pyrender
+        import trimesh
+        vertex_colors = np.ones([vertices.shape[0], 4]) * [0.3, 0.3, 0.3, 0.8]
+        tri_mesh = trimesh.Trimesh(vertices, model.faces,
+                                   vertex_colors=vertex_colors)
+        mesh = pyrender.Mesh.from_trimesh(tri_mesh)
+        scene = pyrender.Scene()
+        scene.add(mesh)
+        if plot_joints:
+            sm = trimesh.creation.uv_sphere(radius=0.005)
+            sm.visual.vertex_colors = [0.9, 0.1, 0.1, 1.0]
+            tfs = np.tile(np.eye(4), (len(joints), 1, 1))
+            tfs[:, :3, 3] = joints
+            joints_pcl = pyrender.Mesh.from_trimesh(sm, poses=tfs)
+            scene.add(joints_pcl)
+        pyrender.Viewer(scene, use_raymond_lighting=True)
+    elif plotting_module == 'matplotlib':
+        from matplotlib import pyplot as plt
+        from mpl_toolkits.mplot3d import Axes3D
+        from mpl_toolkits.mplot3d.art3d import Poly3DCollection
+        fig = plt.figure()
+        ax = fig.add_subplot(111, projection='3d')
+        mesh = Poly3DCollection(vertices[model.faces], alpha=0.1)
+        face_color = (1.0, 1.0, 0.9)
+        edge_color = (0, 0, 0)
+        mesh.set_edgecolor(edge_color)
+        mesh.set_facecolor(face_color)
+        ax.add_collection3d(mesh)
+        ax.scatter(joints[:, 0], joints[:, 1], joints[:, 2], color='r')
+        if plot_joints:
+            ax.scatter(joints[:, 0], joints[:, 1], joints[:, 2], alpha=0.1)
+        plt.show()
+    elif plotting_module == 'open3d':
+        import open3d as o3d
+        mesh = o3d.geometry.TriangleMesh()
+        mesh.vertices = o3d.utility.Vector3dVector(
+            vertices)
+        mesh.triangles = o3d.utility.Vector3iVector(model.faces)
+        mesh.compute_vertex_normals()
+        mesh.paint_uniform_color([0.3, 0.3, 0.3])
+        geometry = [mesh]
+        if plot_joints:
+            joints_pcl = o3d.geometry.PointCloud()
+            joints_pcl.points = o3d.utility.Vector3dVector(joints)
+            joints_pcl.paint_uniform_color([0.7, 0.3, 0.3])
+            geometry.append(joints_pcl)
+        o3d.visualization.draw_geometries(geometry)
+    else:
+        raise ValueError('Unknown plotting_module: {}'.format(plotting_module))
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='SMPL-X Demo')
+    parser.add_argument('--model-folder', required=True, type=str,
+                        help='The path to the model folder')
+    parser.add_argument('--model-type', default='smplx', type=str,
+                        choices=['smpl', 'smplh', 'smplx', 'mano', 'flame'],
+                        help='The type of model to load')
+    parser.add_argument('--gender', type=str, default='neutral',
+                        help='The gender of the model')
+    parser.add_argument('--num-betas', default=10, type=int,
+                        dest='num_betas',
+                        help='Number of shape coefficients.')
+    parser.add_argument('--num-expression-coeffs', default=10, type=int,
+                        dest='num_expression_coeffs',
+                        help='Number of expression coefficients.')
+    parser.add_argument('--plotting-module', type=str, default='pyrender',
+                        dest='plotting_module',
+                        choices=['pyrender', 'matplotlib', 'open3d'],
+                        help='The module to use for plotting the result')
+    parser.add_argument('--ext', type=str, default='npz',
+                        help='Which extension to use for loading')
+    parser.add_argument('--plot-joints', default=False,
+                        type=lambda arg: arg.lower() in ['true', '1'],
+                        help='The path to the model folder')
+    parser.add_argument('--sample-shape', default=True,
+                        dest='sample_shape',
+                        type=lambda arg: arg.lower() in ['true', '1'],
+                        help='Sample a random shape')
+    parser.add_argument('--sample-expression', default=True,
+                        dest='sample_expression',
+                        type=lambda arg: arg.lower() in ['true', '1'],
+                        help='Sample a random expression')
+    parser.add_argument('--use-face-contour', default=False,
+                        type=lambda arg: arg.lower() in ['true', '1'],
+                        help='Compute the contour of the face')
+    args = parser.parse_args()
+    model_folder = osp.expanduser(osp.expandvars(args.model_folder))
+    model_type = args.model_type
+    plot_joints = args.plot_joints
+    use_face_contour = args.use_face_contour
+    gender = args.gender
+    ext = args.ext
+    plotting_module = args.plotting_module
+    num_betas = args.num_betas
+    num_expression_coeffs = args.num_expression_coeffs
+    sample_shape = args.sample_shape
+    sample_expression = args.sample_expression
+    main(model_folder, model_type, ext=ext,
+         gender=gender, plot_joints=plot_joints,
+         num_betas=num_betas,
+         num_expression_coeffs=num_expression_coeffs,
+         sample_shape=sample_shape,
+         sample_expression=sample_expression,
+         plotting_module=plotting_module,
+         use_face_contour=use_face_contour)

lib/utils/smplx/examples/demo_layers.py ADDED Viewed

	@@ -0,0 +1,181 @@

+# -*- coding: utf-8 -*-
+# Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (MPG) is
+# holder of all proprietary rights on this computer program.
+# You can only use this computer program if you have closed
+# a license agreement with MPG or you get the right to use the computer
+# program from someone who is authorized to grant you that right.
+# Any use of the computer program without a valid license is prohibited and
+# liable to prosecution.
+#
+# Copyright©2019 Max-Planck-Gesellschaft zur Förderung
+# der Wissenschaften e.V. (MPG). acting on behalf of its Max Planck Institute
+# for Intelligent Systems. All rights reserved.
+#
+# Contact: ps-license@tuebingen.mpg.de
+import os.path as osp
+import argparse
+import numpy as np
+import torch
+import smplx
+def main(model_folder,
+         model_type='smplx',
+         ext='npz',
+         gender='neutral',
+         plot_joints=False,
+         num_betas=10,
+         sample_shape=True,
+         sample_expression=True,
+         num_expression_coeffs=10,
+         plotting_module='pyrender',
+         use_face_contour=False):
+    model = smplx.build_layer(
+        model_folder, model_type=model_type,
+        gender=gender, use_face_contour=use_face_contour,
+        num_betas=num_betas,
+        num_expression_coeffs=num_expression_coeffs,
+        ext=ext)
+    print(model)
+    betas, expression = None, None
+    if sample_shape:
+        betas = torch.randn([1, model.num_betas], dtype=torch.float32)
+    if sample_expression:
+        expression = torch.randn(
+            [1, model.num_expression_coeffs], dtype=torch.float32)
+    output = model(betas=betas, expression=expression,
+                   return_verts=True)
+    vertices = output.vertices.detach().cpu().numpy().squeeze()
+    joints = output.joints.detach().cpu().numpy().squeeze()
+    print('Vertices shape =', vertices.shape)
+    print('Joints shape =', joints.shape)
+    if plotting_module == 'pyrender':
+        import pyrender
+        import trimesh
+        vertex_colors = np.ones([vertices.shape[0], 4]) * [0.3, 0.3, 0.3, 0.8]
+        tri_mesh = trimesh.Trimesh(vertices, model.faces,
+                                   vertex_colors=vertex_colors)
+        mesh = pyrender.Mesh.from_trimesh(tri_mesh)
+        scene = pyrender.Scene()
+        scene.add(mesh)
+        if plot_joints:
+            sm = trimesh.creation.uv_sphere(radius=0.005)
+            sm.visual.vertex_colors = [0.9, 0.1, 0.1, 1.0]
+            tfs = np.tile(np.eye(4), (len(joints), 1, 1))
+            tfs[:, :3, 3] = joints
+            joints_pcl = pyrender.Mesh.from_trimesh(sm, poses=tfs)
+            scene.add(joints_pcl)
+        pyrender.Viewer(scene, use_raymond_lighting=True)
+    elif plotting_module == 'matplotlib':
+        from matplotlib import pyplot as plt
+        from mpl_toolkits.mplot3d import Axes3D
+        from mpl_toolkits.mplot3d.art3d import Poly3DCollection
+        fig = plt.figure()
+        ax = fig.add_subplot(111, projection='3d')
+        mesh = Poly3DCollection(vertices[model.faces], alpha=0.1)
+        face_color = (1.0, 1.0, 0.9)
+        edge_color = (0, 0, 0)
+        mesh.set_edgecolor(edge_color)
+        mesh.set_facecolor(face_color)
+        ax.add_collection3d(mesh)
+        ax.scatter(joints[:, 0], joints[:, 1], joints[:, 2], color='r')
+        if plot_joints:
+            ax.scatter(joints[:, 0], joints[:, 1], joints[:, 2], alpha=0.1)
+        plt.show()
+    elif plotting_module == 'open3d':
+        import open3d as o3d
+        mesh = o3d.geometry.TriangleMesh()
+        mesh.vertices = o3d.utility.Vector3dVector(
+            vertices)
+        mesh.triangles = o3d.utility.Vector3iVector(model.faces)
+        mesh.compute_vertex_normals()
+        mesh.paint_uniform_color([0.3, 0.3, 0.3])
+        geometry = [mesh]
+        if plot_joints:
+            joints_pcl = o3d.geometry.PointCloud()
+            joints_pcl.points = o3d.utility.Vector3dVector(joints)
+            joints_pcl.paint_uniform_color([0.7, 0.3, 0.3])
+            geometry.append(joints_pcl)
+        o3d.visualization.draw_geometries(geometry)
+    else:
+        raise ValueError('Unknown plotting_module: {}'.format(plotting_module))
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='SMPL-X Demo')
+    parser.add_argument('--model-folder', required=True, type=str,
+                        help='The path to the model folder')
+    parser.add_argument('--model-type', default='smplx', type=str,
+                        choices=['smpl', 'smplh', 'smplx', 'mano', 'flame'],
+                        help='The type of model to load')
+    parser.add_argument('--gender', type=str, default='neutral',
+                        help='The gender of the model')
+    parser.add_argument('--num-betas', default=10, type=int,
+                        dest='num_betas',
+                        help='Number of shape coefficients.')
+    parser.add_argument('--num-expression-coeffs', default=10, type=int,
+                        dest='num_expression_coeffs',
+                        help='Number of expression coefficients.')
+    parser.add_argument('--plotting-module', type=str, default='pyrender',
+                        dest='plotting_module',
+                        choices=['pyrender', 'matplotlib', 'open3d'],
+                        help='The module to use for plotting the result')
+    parser.add_argument('--ext', type=str, default='npz',
+                        help='Which extension to use for loading')
+    parser.add_argument('--plot-joints', default=False,
+                        type=lambda arg: arg.lower() in ['true', '1'],
+                        help='The path to the model folder')
+    parser.add_argument('--sample-shape', default=True,
+                        dest='sample_shape',
+                        type=lambda arg: arg.lower() in ['true', '1'],
+                        help='Sample a random shape')
+    parser.add_argument('--sample-expression', default=True,
+                        dest='sample_expression',
+                        type=lambda arg: arg.lower() in ['true', '1'],
+                        help='Sample a random expression')
+    parser.add_argument('--use-face-contour', default=False,
+                        type=lambda arg: arg.lower() in ['true', '1'],
+                        help='Compute the contour of the face')
+    args = parser.parse_args()
+    model_folder = osp.expanduser(osp.expandvars(args.model_folder))
+    model_type = args.model_type
+    plot_joints = args.plot_joints
+    use_face_contour = args.use_face_contour
+    gender = args.gender
+    ext = args.ext
+    plotting_module = args.plotting_module
+    num_betas = args.num_betas
+    num_expression_coeffs = args.num_expression_coeffs
+    sample_shape = args.sample_shape
+    sample_expression = args.sample_expression
+    main(model_folder, model_type, ext=ext,
+         gender=gender, plot_joints=plot_joints,
+         num_betas=num_betas,
+         num_expression_coeffs=num_expression_coeffs,
+         sample_shape=sample_shape,
+         sample_expression=sample_expression,
+         plotting_module=plotting_module,
+         use_face_contour=use_face_contour)

lib/utils/smplx/examples/vis_flame_vertices.py ADDED Viewed

	@@ -0,0 +1,92 @@

+# -*- coding: utf-8 -*-
+# Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (MPG) is
+# holder of all proprietary rights on this computer program.
+# You can only use this computer program if you have closed
+# a license agreement with MPG or you get the right to use the computer
+# program from someone who is authorized to grant you that right.
+# Any use of the computer program without a valid license is prohibited and
+# liable to prosecution.
+#
+# Copyright©2019 Max-Planck-Gesellschaft zur Förderung
+# der Wissenschaften e.V. (MPG). acting on behalf of its Max Planck Institute
+# for Intelligent Systems. All rights reserved.
+#
+# Contact: ps-license@tuebingen.mpg.de
+import os.path as osp
+import argparse
+import pickle
+import numpy as np
+import torch
+import open3d as o3d
+import smplx
+def main(model_folder, corr_fname, ext='npz',
+         head_color=(0.3, 0.3, 0.6),
+         gender='neutral'):
+    head_idxs = np.load(corr_fname)
+    model = smplx.create(model_folder, model_type='smplx',
+                         gender=gender,
+                         ext=ext)
+    betas = torch.zeros([1, 10], dtype=torch.float32)
+    expression = torch.zeros([1, 10], dtype=torch.float32)
+    output = model(betas=betas, expression=expression,
+                   return_verts=True)
+    vertices = output.vertices.detach().cpu().numpy().squeeze()
+    joints = output.joints.detach().cpu().numpy().squeeze()
+    print('Vertices shape =', vertices.shape)
+    print('Joints shape =', joints.shape)
+    mesh = o3d.geometry.TriangleMesh()
+    mesh.vertices = o3d.utility.Vector3dVector(vertices)
+    mesh.triangles = o3d.utility.Vector3iVector(model.faces)
+    mesh.compute_vertex_normals()
+    colors = np.ones_like(vertices) * [0.3, 0.3, 0.3]
+    colors[head_idxs] = head_color
+    mesh.vertex_colors = o3d.utility.Vector3dVector(colors)
+    o3d.visualization.draw_geometries([mesh])
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='SMPL-X Demo')
+    parser.add_argument('--model-folder', required=True, type=str,
+                        help='The path to the model folder')
+    parser.add_argument('--corr-fname', required=True, type=str,
+                        dest='corr_fname',
+                        help='Filename with the head correspondences')
+    parser.add_argument('--gender', type=str, default='neutral',
+                        help='The gender of the model')
+    parser.add_argument('--ext', type=str, default='npz',
+                        help='Which extension to use for loading')
+    parser.add_argument('--head', default='right',
+                        choices=['right', 'left'],
+                        type=str, help='Which head to plot')
+    parser.add_argument('--head-color', type=float, nargs=3, dest='head_color',
+                        default=(0.3, 0.3, 0.6),
+                        help='Color for the head vertices')
+    args = parser.parse_args()
+    model_folder = osp.expanduser(osp.expandvars(args.model_folder))
+    corr_fname = args.corr_fname
+    gender = args.gender
+    ext = args.ext
+    head = args.head
+    head_color = args.head_color
+    main(model_folder, corr_fname, ext=ext,
+         head_color=head_color,
+         gender=gender
+         )

lib/utils/smplx/examples/vis_mano_vertices.py ADDED Viewed

	@@ -0,0 +1,99 @@

+# -*- coding: utf-8 -*-
+# Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (MPG) is
+# holder of all proprietary rights on this computer program.
+# You can only use this computer program if you have closed
+# a license agreement with MPG or you get the right to use the computer
+# program from someone who is authorized to grant you that right.
+# Any use of the computer program without a valid license is prohibited and
+# liable to prosecution.
+#
+# Copyright©2019 Max-Planck-Gesellschaft zur Förderung
+# der Wissenschaften e.V. (MPG). acting on behalf of its Max Planck Institute
+# for Intelligent Systems. All rights reserved.
+#
+# Contact: ps-license@tuebingen.mpg.de
+import os.path as osp
+import argparse
+import pickle
+import numpy as np
+import torch
+import open3d as o3d
+import smplx
+def main(model_folder, corr_fname, ext='npz',
+         hand_color=(0.3, 0.3, 0.6),
+         gender='neutral', hand='right'):
+    with open(corr_fname, 'rb') as f:
+        idxs_data = pickle.load(f)
+        if hand == 'both':
+            hand_idxs = np.concatenate(
+                [idxs_data['left_hand'], idxs_data['right_hand']]
+            )
+        else:
+            hand_idxs = idxs_data[f'{hand}_hand']
+    model = smplx.create(model_folder, model_type='smplx',
+                         gender=gender,
+                         ext=ext)
+    betas = torch.zeros([1, 10], dtype=torch.float32)
+    expression = torch.zeros([1, 10], dtype=torch.float32)
+    output = model(betas=betas, expression=expression,
+                   return_verts=True)
+    vertices = output.vertices.detach().cpu().numpy().squeeze()
+    joints = output.joints.detach().cpu().numpy().squeeze()
+    print('Vertices shape =', vertices.shape)
+    print('Joints shape =', joints.shape)
+    mesh = o3d.geometry.TriangleMesh()
+    mesh.vertices = o3d.utility.Vector3dVector(vertices)
+    mesh.triangles = o3d.utility.Vector3iVector(model.faces)
+    mesh.compute_vertex_normals()
+    colors = np.ones_like(vertices) * [0.3, 0.3, 0.3]
+    colors[hand_idxs] = hand_color
+    mesh.vertex_colors = o3d.utility.Vector3dVector(colors)
+    o3d.visualization.draw_geometries([mesh])
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='SMPL-X Demo')
+    parser.add_argument('--model-folder', required=True, type=str,
+                        help='The path to the model folder')
+    parser.add_argument('--corr-fname', required=True, type=str,
+                        dest='corr_fname',
+                        help='Filename with the hand correspondences')
+    parser.add_argument('--gender', type=str, default='neutral',
+                        help='The gender of the model')
+    parser.add_argument('--ext', type=str, default='npz',
+                        help='Which extension to use for loading')
+    parser.add_argument('--hand', default='right',
+                        choices=['right', 'left', 'both'],
+                        type=str, help='Which hand to plot')
+    parser.add_argument('--hand-color', type=float, nargs=3, dest='hand_color',
+                        default=(0.3, 0.3, 0.6),
+                        help='Color for the hand vertices')
+    args = parser.parse_args()
+    model_folder = osp.expanduser(osp.expandvars(args.model_folder))
+    corr_fname = args.corr_fname
+    gender = args.gender
+    ext = args.ext
+    hand = args.hand
+    hand_color = args.hand_color
+    main(model_folder, corr_fname, ext=ext,
+         hand_color=hand_color,
+         gender=gender, hand=hand
+         )

lib/utils/smplx/setup.py ADDED Viewed

	@@ -0,0 +1,79 @@

+# -*- coding: utf-8 -*-
+# Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (MPG) is
+# holder of all proprietary rights on this computer program.
+# You can only use this computer program if you have closed
+# a license agreement with MPG or you get the right to use the computer
+# program from someone who is authorized to grant you that right.
+# Any use of the computer program without a valid license is prohibited and
+# liable to prosecution.
+#
+# Copyright©2019 Max-Planck-Gesellschaft zur Förderung
+# der Wissenschaften e.V. (MPG). acting on behalf of its Max Planck Institute
+# for Intelligent Systems and the Max Planck Institute for Biological
+# Cybernetics. All rights reserved.
+#
+# Contact: ps-license@tuebingen.mpg.de
+import io
+import os
+from setuptools import setup
+# Package meta-data.
+NAME = 'smplx'
+DESCRIPTION = 'PyTorch module for loading the SMPLX body model'
+URL = 'http://smpl-x.is.tuebingen.mpg.de'
+EMAIL = 'vassilis.choutas@tuebingen.mpg.de'
+AUTHOR = 'Vassilis Choutas'
+REQUIRES_PYTHON = '>=3.6.0'
+VERSION = '0.1.21'
+here = os.path.abspath(os.path.dirname(__file__))
+try:
+    FileNotFoundError
+except NameError:
+    FileNotFoundError = IOError
+# Import the README and use it as the long-description.
+# Note: this will only work if 'README.md' is present in your MANIFEST.in file!
+try:
+    with io.open(os.path.join(here, 'README.md'), encoding='utf-8') as f:
+        long_description = '\n' + f.read()
+except FileNotFoundError:
+    long_description = DESCRIPTION
+# Load the package's __version__.py module as a dictionary.
+about = {}
+if not VERSION:
+    with open(os.path.join(here, NAME, '__version__.py')) as f:
+        exec(f.read(), about)
+else:
+    about['__version__'] = VERSION
+pyrender_reqs = ['pyrender>=0.1.23', 'trimesh>=2.37.6', 'shapely']
+matplotlib_reqs = ['matplotlib']
+open3d_reqs = ['open3d-python']
+setup(name=NAME,
+      version=about['__version__'],
+      description=DESCRIPTION,
+      long_description=long_description,
+      long_description_content_type='text/markdown',
+      author=AUTHOR,
+      author_email=EMAIL,
+      python_requires=REQUIRES_PYTHON,
+      url=URL,
+      install_requires=[
+          'numpy>=1.16.2',
+          'torch>=1.0.1.post2',
+          'torchgeometry>=0.1.2'
+      ],
+      extras_require={
+          'pyrender': pyrender_reqs,
+          'open3d': open3d_reqs,
+          'matplotlib': matplotlib_reqs,
+          'all': pyrender_reqs + matplotlib_reqs + open3d_reqs
+      },
+      packages=['smplx', 'tools'])