Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

LICENSE +162 -0
README.md +200 -0
config.yaml +27 -0
model.safetensors +3 -0
stats/motion/body/mean.npy +3 -0
stats/motion/body/std.npy +3 -0
stats/motion/global_root/mean.npy +3 -0
stats/motion/global_root/std.npy +3 -0
stats/motion/local_root/mean.npy +3 -0
stats/motion/local_root/std.npy +3 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,162 @@

+Reference: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
+NVIDIA Open Model License Agreement
+This NVIDIA Open Model License Agreement (the "Agreement") is a legal agreement
+between the Legal Entity You represent, or if no entity is identified, You and
+NVIDIA Corporation and its Affiliates ("NVIDIA") and governs Your use of the
+Models that NVIDIA provides to You under this Agreement. NVIDIA and You are each
+a "party" and collectively the "parties."
+NVIDIA models released under this Agreement are intended to be used permissively
+and enable the further development of AI technologies. Subject to the terms of
+this Agreement, NVIDIA confirms that:
+- Models are commercially usable.
+- You are free to create and distribute Derivative Models.
+- NVIDIA does not claim ownership to any outputs generated using the Models or
+  Derivative Models.
+By using, reproducing, modifying, distributing, performing or displaying any
+portion or element of the Model or Derivative Model, or otherwise accepting the
+terms of this Agreement, you agree to be bound by this Agreement.
+1. Definitions. The following definitions apply to this Agreement:
+1.1. "Derivative Model" means all (a) modifications to the Model, (b) works
+based on the Model, and (c) any other derivative works of the Model. An output
+is not a Derivative Model.
+1.2. "Legal Entity" means the union of the acting entity and all other entities
+that control, are controlled by, or are under common control with that entity.
+For the purposes of this definition, "control" means (a) the power, direct or
+indirect, to cause the direction or management of such entity, whether by
+contract or otherwise, or (b) ownership of fifty percent (50%) or more of the
+outstanding shares, or (c) beneficial ownership of such entity.
+1.3. "Model" means the machine learning model, software, checkpoints, learnt
+weights, algorithms, parameters, configuration files and documentation shared
+under this Agreement.
+1.4. "NVIDIA Cosmos Model" means a multimodal Model shared under this Agreement.
+1.5. "Special-Purpose Model" means a Model that is only competent in a narrow
+set of purpose-specific tasks and should not be used for unintended or
+general-purpose applications.
+1.6. "You" or "Your" means an individual or Legal Entity exercising permissions
+granted by this Agreement.
+2. Conditions for Use, License Grant, AI Ethics and IP Ownership.
+2.1. Conditions for Use. The Model and any Derivative Model are subject to
+additional terms as described in Section 2 and Section 3 of this Agreement and
+govern Your use. If You institute copyright or patent litigation against any
+entity (including a cross-claim or counterclaim in a lawsuit) alleging that the
+Model or a Derivative Model constitutes direct or contributory copyright or
+patent infringement, then any licenses granted to You under this Agreement for
+that Model or Derivative Model will terminate as of the date such litigation is
+filed. If You bypass, disable, reduce the efficacy of, or circumvent any
+technical limitation, safety guardrail or associated safety guardrail
+hyperparameter, encryption, security, digital rights management, or
+authentication mechanism (collectively "Guardrail") contained in the Model
+without a substantially similar Guardrail appropriate for your use case, your
+rights under this Agreement will automatically terminate. NVIDIA may indicate in
+relevant documentation that a Model is a Special-Purpose Model. NVIDIA may
+update this Agreement to comply with legal and regulatory requirements at any
+time and You agree to either comply with any updated license or cease Your
+copying, use, and distribution of the Model and any Derivative Model.
+2.2. License Grant. The rights granted herein are explicitly conditioned on Your
+full compliance with the terms of this Agreement. Subject to the terms and
+conditions of this Agreement, NVIDIA hereby grants to You a perpetual,
+worldwide, non-exclusive, no-charge, royalty-free, revocable (as stated in
+Section 2.1) license to publicly perform, publicly display, reproduce, use,
+create derivative works of, make, have made, sell, offer for sale, distribute
+(through multiple tiers of distribution) and import the Model.
+2.3. AI Ethics. Use of the Models under the Agreement must be consistent with
+NVIDIA's Trustworthy AI terms found at
+https://www.nvidia.com/en-us/agreements/trustworthy-ai/terms/.
+2.4. NVIDIA owns the Model and any Derivative Models created by NVIDIA. Subject
+to NVIDIA's underlying ownership rights in the Model or its Derivative Models,
+You are and will be the owner of Your Derivative Models. NVIDIA claims no
+ownership rights in outputs. You are responsible for outputs and their
+subsequent uses. Except as expressly granted in this Agreement, (a) NVIDIA
+reserves all rights, interests and remedies in connection with the Model and
+(b) no other license or right is granted to you by implication, estoppel or
+otherwise.
+3. Redistribution. You may reproduce and distribute copies of the Model or
+Derivative Models thereof in any medium, with or without modifications, provided
+that You meet the following conditions:
+3.1. If you distribute the Model, You must give any other recipients of the
+Model a copy of this Agreement and include the following attribution notice
+within a "Notice" text file with such copies: "Licensed by NVIDIA Corporation
+under the NVIDIA Open Model License";
+3.2. If you distribute or make available a NVIDIA Cosmos Model, or a product or
+service (including an AI model) that contains or uses a NVIDIA Cosmos Model, use
+a NVIDIA Cosmos Model to create a Derivative Model, or use a NVIDIA Cosmos Model
+or its outputs to create, train, fine tune, or otherwise improve an AI model,
+you will include "Built on NVIDIA Cosmos" on a related website, user interface,
+blogpost, about page, or product documentation; and
+3.3. You may add Your own copyright statement to Your modifications and may
+provide additional or different license terms and conditions for use,
+reproduction, or distribution of Your modifications, or for any such Derivative
+Models as a whole, provided Your use, reproduction, and distribution of the
+Model otherwise complies with the conditions stated in this Agreement.
+4. Separate Components. The Models may include or be distributed with components
+provided with separate legal notices or terms that accompany the components,
+such as an Open Source Software License or other third-party license. The
+components are subject to the applicable other licenses, including any
+proprietary notices, disclaimers, requirements and extended use rights; except
+that this Agreement will prevail regarding the use of third-party Open Source
+Software License, unless a third-party Open Source Software License requires its
+license terms to prevail. "Open Source Software License" means any software,
+data or documentation subject to any license identified as an open source
+license by the Open Source Initiative (https://opensource.org), Free Software
+Foundation (https://www.fsf.org) or other similar open source organization or
+listed by the Software Package Data Exchange (SPDX) Workgroup under the Linux
+Foundation (https://www.spdx.org).
+5. Trademarks. This Agreement does not grant permission to use the trade names,
+trademarks, service marks, or product names of NVIDIA, except as required for
+reasonable and customary use in describing the origin of the Model and
+reproducing the content of the "Notice" text file.
+6. Disclaimer of Warranty. Unless required by applicable law or agreed to in
+writing, NVIDIA provides the Model on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+CONDITIONS OF ANY KIND, either express or implied, including, without
+limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT,
+MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely
+responsible for reviewing Model documentation, including any Special-Purpose
+Model limitations, and determining the appropriateness of using or
+redistributing the Model, Derivative Models and outputs. You assume any risks
+associated with Your exercise of permissions under this Agreement.
+7. Limitation of Liability. In no event and under no legal theory, whether in
+tort (including negligence), contract, or otherwise, unless required by
+applicable law (such as deliberate and grossly negligent acts) or agreed to in
+writing, will NVIDIA be liable to You for damages, including any direct,
+indirect, special, incidental, or consequential damages of any character
+arising as a result of this Agreement or out of the use or inability to use the
+Model, Derivative Models or outputs (including but not limited to damages for
+loss of goodwill, work stoppage, computer failure or malfunction, or any and all
+other commercial damages or losses), even if NVIDIA has been advised of the
+possibility of such damages.
+8. Indemnity. You will indemnify and hold harmless NVIDIA from and against any
+claim by any third party arising out of or related to your use or distribution
+of the Model, Derivative Models or outputs.
+9. Feedback. NVIDIA appreciates your feedback, and You agree that NVIDIA may use
+it without restriction or compensation to You.
+10. Governing Law. This Agreement will be governed in all respects by the laws
+of the United States and the laws of the State of Delaware, without regard to
+conflict of laws principles or the United Nations Convention on Contracts for
+the International Sale of Goods. The state and federal courts residing in Santa
+Clara County, California will have exclusive jurisdiction over any dispute or
+claim arising out of or related to this Agreement, and the parties irrevocably
+consent to personal jurisdiction and venue in those courts; except that, either
+party may apply for injunctive remedies or an equivalent type of urgent legal
+relief in any jurisdiction.
+11. Trade and Compliance. You agree to comply with all applicable export,
+import, trade and economic sanctions laws and regulations, as amended, including
+without limitation U.S. Export Administration Regulations and Office of Foreign
+Assets Control regulations. These laws include restrictions on destinations,
+end-users and end-use.
+Version Release Date: October 24, 2025

README.md ADDED Viewed

	@@ -0,0 +1,200 @@

+---
+license: other
+license_name: nvidia-open-model-license
+license_link: >-
+  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license
+library_name: kimodo
+tags:
+  - nvidia
+  - kimodo
+  - seed
+  - g1
+---
+# Kimodo: Controllable Kinematic Motion Diffusion at Scale
+**[Paper](https://research.nvidia.com/labs/sil/projects/kimodo/assets/kimodo_tech_report.pdf), [Project Page](https://research.nvidia.com/labs/sil/projects/kimodo/)**
+### Description:
+Kimodo (**Ki**nematic **Mo**tion **D**iffusi**o**n) generates three-dimensional (3D) skeletal body animations given a text prompt and/or constraints on the motion like full-body poses, end-effector joint positions, paths, and waypoints to follow.
+The Kimodo model family includes models trained on different skeletons and datasets:
+* Kimodo-SOMA-RP
+    * Trained on the 30-joint SOMA skeleton with the proprietary Bones Rigplay dataset.
+* Kimodo-SOMA-SEED
+    * Trained on the 30-joint SOMA skeleton with the open Bones-SEED dataset.
+* Kimodo-G1-RP
+    * Trained on the proprietary Bones Rigplay dataset retargeted to the 34-joint Unitree G1 robot skeleton.
+* Kimodo-G1-SEED
+    * Trained on the open Bones-SEED dataset retargeted to the 34-joint Unitree G1 robot skeleton.
+* Kimodo-SMPLX-RP
+    * Trained on the proprietary Bones Rigplay dataset retargeted to the 22-joint SMPLX-body skeleton.
+This release pertains to Kimodo-G1-SEED. This model is ready for commercial use.
+### License:
+This model is released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
+### Deployment Geography:
+Global
+### Use Case: <br>
+The model is intended for users with any level of animation experience to create 3D human motion data for their application. This may include:
+* Demonstrations for humanoid robots
+* Digital human motion for digital twin and industrial simulations
+* Digital human motion for synthetic data
+* Animations for game and media development
+### Release Date:  <br>
+Github [03/16/2026] via [link](https://github.com/nv-tlabs/kimodo) <br>
+HuggingFace [03/16/2026] via [link](https://huggingface.co/nvidia/Kimodo-G1-SEED-v1) <br>
+## References:
+* Technical report: [Kimodo: Scaling Controllable Human Motion Generation](https://research.nvidia.com/labs/sil/projects/kimodo/assets/kimodo_tech_report.pdf)
+* Webpage: [link](https://research.nvidia.com/labs/sil/projects/kimodo/)
+## Model Architecture:
+**Architecture Type:** Diffusion Model <br>
+**Network Architecture:** Novel Two-Stage Transformer <br>
+**Model Size:** 282 M parameters
+## Inputs: <br>
+**Input Types:** Text, Duration (Num Frames), Pose Constraints <br>
+**Input Formats:**
+- Text: String
+- Duration: Integer
+- Pose Constraints: Matrix
+**Input Parameters:**
+- Text: One-Dimensional (1D)
+- Duration: One-Dimensional (1D)
+- Pose Constraints:
+    - One-Dimensional (1D) frame index of each constraint
+    - Features to constrain may include Three-Dimensional (3D) joint positions, (3x3) joint rotation matrices, Two-Dimensional (2D) heading direction, and/or Two-Dimensional (2D) root position
+**Other Properties Related to Input:** Maximum duration is 10 sec (300 frames at 30 frames per second).
+## Outputs
+**Output Type:** Skeleton Motion: Root Translation and Joint Rotations <br>
+**Output Formats:**
+- Root Translation: Matrix
+- Joint Rotations: Matrix
+**Output Parameters:**
+- Root Translation: Two-Dimensional (`num_frames` x 3)
+- Joint Rotations: Four-Dimensional (`num_frames` x 34 x 3 x 3)
+**Other Properties Related to Outupt:**
+* Motions are at 30 frames per second (30 fps)
+Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. <br>
+## Software Integration:
+**Runtime Engines:**
+* PyTorch
+**Supported Hardware Microarchitecture Compatibility:** <br>
+* NVIDIA Ampere
+* NVIDIA Blackwell
+* NVIDIA Lovelace
+**Supported Operating Systems:**
+* Linux
+* Windows
+The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. <br>
+## Model Version
+Kimodo-G1-SEED-v1
+## Training and Testing Datasets:
+**Name**: Public Bones-SEED Dataset
+**Data Modalities**
+* Text
+* Human Motion Capture
+**Data Size**:
+* Less than 1 Billion tokens of text
+* 288 hours of human motion capture
+**Data Collection Method** <br>
+Automatic/Sensors
+**Labeling Method** <br>
+Hybrid: Automatic/Sensors, Human
+**Properties:** 288 hours of captured human body motions with corresponding text descriptions. Split into 90%/10% train/test splits. Various augmentations were employed to expand text variety. Motions were retargeted to G1 robot skeleton for training.
+**Quantitative Evaluation** <br>
+For test set evaluation, please refer to the [technical report](https://research.nvidia.com/labs/sil/projects/kimodo/assets/kimodo_tech_report.pdf)
+# Inference:
+**Acceleration Engine:** N/A<br>
+**Test Hardware:** <br>
+* GeForce RTX 3090
+* GeForce RTX 4090
+* GeForce RTX 5090
+* NVIDIA A100
+* NVIDIA L40S
+* NVIDIA L4
+* NVIDIA RTX 6000 Ada
+* NVIDIA RTX A6000
+## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. <br>
+For more detailed information on ethical considerations for this model, please see the Bias, Explainability, Safety & Security, and Privacy Subcards below. <br>
+Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
+## Bias
+Field                                                                                               |  Response
+:---------------------------------------------------------------------------------------------------|:---------------
+Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing:  |  Gender
+Measures taken to mitigate against unwanted bias:                                                   |  Our training data contains motion captured from a roughly equal number of male and female actors
+## Explainability
+Field                                                                                                  |  Response
+:------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
+Intended Task/Domain:                                                                   |  Robotics
+Model Type:                                                                                            |  Diffusion Transformer
+Intended Users:                                                                                        |  The model is intended for users with any level of animation experience to create 3D humanoid robot motion data for their application. This may include demonstrations for humanoid robots or robot motions for simulations and synthetic data.
+Output:                                                                                                |  3D skeletal animation (root translation and joint rotations)
+Describe how the model works:                                                                          |  Text input and pose constraints are processed and given to a transformer-based model that iteratively denoises a sequence of body poses.
+Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:  |  Gender
+Technical Limitations & Mitigation:                                                                    |  Generated motions may include artifacts like foot skating where feet slide unnaturally when they should be in static contact with the ground. The motion does not always follow the given text prompt, and the model does not know how to perform certain types of actions (e.g., the model is best at locomotion, gestures, dancing, and everyday activities). Each trained model currently outputs motion for a single character skeleton. The model is designed to output realistic motions, so it cannot create cartoon motions or non-physically plausible motions. The model is not aware of objects in the scene around a character.
+Verified to have met prescribed NVIDIA quality standards:  |  Yes
+Performance Metrics:                                                                                   |  Pose Constraint Accuracy (joint distance error), Motion Quality (foot-skating error, FID, latent similarity), Text-Following Accuracy (R-precision, latent similarity)
+Potential Known Risks:                                                                                 |  The model may output body motions that inadvertently reflect stereotypes related to age, gender, or physical characteristics. To mitigate this, prompts should describe actions in neutral, physical terms (e.g., “A person walks slowly with shuffled steps”) rather than relying on demographic adjectives.
+Licensing:                                                                                             |  This model is released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
+## Privacy
+Field                                                                                                                              |  Response
+:----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
+Generatable or reverse engineerable personal data?                                                     |  No
+Personal data used to create this model?                                                                                       |  No
+How often is dataset reviewed?                                                                                                     |  During dataset creation, model training, evaluation and before release
+Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? |  No
+Is there provenance for all datasets used in training?                                                                                |  Yes
+Does data labeling (annotation, metadata) comply with privacy laws?                                                                |  Yes
+Is data compliant with data subject requests for data correction or removal, if such a request was made?                           |  Not Applicable
+Applicable Privacy Policy        | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/
+## Safety
+Field                                               |  Response
+:---------------------------------------------------|:----------------------------------
+Model Application Field(s):                               |  Media & Entertainment, Industrial/Machinery and Robotics, Autonomous Vehicles
+Describe the life critical impact (if present).   |  Not Applicable
+Use Case Restrictions:                              |  Abide by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
+Model and dataset restrictions:            |  The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development.  Restrictions enforce dataset access during training, and dataset license constraints adhered to.

config.yaml ADDED Viewed

	@@ -0,0 +1,27 @@

+_target_: kimodo.model.Kimodo
+num_base_steps: 1000
+cfg_type: separated
+denoiser:
+  _target_: kimodo.model.twostage_denoiser.TwostageDenoiser
+  ckpt_path: ${oc.select:checkpoint_dir}/model.safetensors
+  motion_mask_mode: concat
+  motion_rep:
+    _target_: kimodo.motion_rep.KimodoMotionRep
+    fps: 30
+    stats_path: ${oc.select:checkpoint_dir}/stats/motion/
+    skeleton:
+      _target_: kimodo.skeleton.G1Skeleton34
+  llm_shape:
+  - 1
+  - 4096
+  use_text_mask: false
+  latent_dim: 1024
+  ff_size: 2048
+  num_layers: 16
+  num_heads: 8
+  activation: gelu
+  dropout: 0.0
+  pe_dropout: 0.0
+  norm_first: false
+  num_text_tokens_override: 50
+  input_first_heading_angle: true

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1a99c6671aa5e0a0af242b84dc2fc397686a2ca418a15a0fb9f38096dab884a0
+size 1134168268

stats/motion/body/mean.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:185564a7246ee03d7a7aa2a16b3c7e49cffd5a858f637ab2b8fa04574f18d29b
+size 3424

stats/motion/body/std.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5df7b200de9bdb0d92cf23cd1426d2ded6c7688d1dcb6e403c3886f1d130f59e
+size 3424

stats/motion/global_root/mean.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c292f97ed86266332abdbb7e98d464eb91115a4ec1def3c87aa9362ae483b062
+size 168

stats/motion/global_root/std.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:691f1ada7ac34f4854d8d5920cafb26fd22f1d5fb9dbc331030292eb6675c6eb
+size 168

stats/motion/local_root/mean.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0aab16477bd39f03934bb930786526563b6e5955d1ee0b2face7d1e2c04bba8
+size 160

stats/motion/local_root/std.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9b8f34564fda07cffb5689b2f509b6864b4031a764ca53bfa149ce8487495c62
+size 160