nvidia
/

GEM-X

@@ -19,7 +19,7 @@ library_name: gem
 GEM is a family of Generalist Human Motion models developed by NVIDIA. This repository hosts two model variants:
 - **GEM-SOMA** — Full-body 77-joint pose (body + hands + face) using the [SOMA](https://research.nvidia.com/labs/dair/gem/) body model
-- **GEM-SMPL** — 17-joint body pose using the SMPLx body model, with support for text/audio/music conditioning
 Both models reconstruct 3D human motion from monocular video with dynamic cameras, recovering both camera-space and global motion trajectories.
@@ -34,7 +34,7 @@ Both models reconstruct 3D human motion from monocular video with dynamic camera
 | Model | Checkpoint | Body Model | Joints | Config | Code |
 |---|---|---|---|---|---|
 | GEM-SOMA | `gem_soma.ckpt` | SOMA | 77 (body + hands + face) | `config.json` | [GEM-X](https://github.com/NVlabs/GEM-X) |
-| GEM-SMPL | `gem_smpl.ckpt` | SMPLx | 17 (body) | `gem_smpl_config.json` | [GEM-SMPL](https://github.com/NVlabs/GEM-SMPL) |
 ---
@@ -113,8 +113,8 @@ weights = torch.load(path, weights_only=False)
 | Property | Value |
 |---|---|
-| Architecture | 12-layer Transformer encoder (RoPE, 512 latent dim, 8 heads) |
-| Body model | SMPLx (17 joints, body only) |
 | Feature space | gvhmr, 151-dim |
 | Input | RGB video + 2D keypoints + bounding box + camera intrinsics (+ optional text/audio) |
 | Output | Per-frame SMPL body parameters (pose, shape, translation) |

 GEM is a family of Generalist Human Motion models developed by NVIDIA. This repository hosts two model variants:
 - **GEM-SOMA** — Full-body 77-joint pose (body + hands + face) using the [SOMA](https://research.nvidia.com/labs/dair/gem/) body model
+- **GEM-SMPL** — 17-joint body pose using the SMPL body model, with support for text/audio/music conditioning
 Both models reconstruct 3D human motion from monocular video with dynamic cameras, recovering both camera-space and global motion trajectories.
 | Model | Checkpoint | Body Model | Joints | Config | Code |
 |---|---|---|---|---|---|
 | GEM-SOMA | `gem_soma.ckpt` | SOMA | 77 (body + hands + face) | `config.json` | [GEM-X](https://github.com/NVlabs/GEM-X) |
+| GEM-SMPL | `gem_smpl.ckpt` | SMPL | 17 (body) | `gem_smpl_config.json` | [GEM-SMPL](https://github.com/NVlabs/GEM-SMPL) |
 ---
 | Property | Value |
 |---|---|
+| Architecture | 16-layer Transformer encoder (RoPE, 1024 latent dim, 8 heads) |
+| Body model | SMPL (17 joints, body only) |
 | Feature space | gvhmr, 151-dim |
 | Input | RGB video + 2D keypoints + bounding box + camera intrinsics (+ optional text/audio) |
 | Output | Per-frame SMPL body parameters (pose, shape, translation) |