jeffffffli commited on
Commit
f6781b6
·
verified ·
1 Parent(s): 398a787

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -19,7 +19,7 @@ library_name: gem
19
  GEM is a family of Generalist Human Motion models developed by NVIDIA. This repository hosts two model variants:
20
 
21
  - **GEM-SOMA** — Full-body 77-joint pose (body + hands + face) using the [SOMA](https://research.nvidia.com/labs/dair/gem/) body model
22
- - **GEM-SMPL** — 17-joint body pose using the SMPLx body model, with support for text/audio/music conditioning
23
 
24
  Both models reconstruct 3D human motion from monocular video with dynamic cameras, recovering both camera-space and global motion trajectories.
25
 
@@ -34,7 +34,7 @@ Both models reconstruct 3D human motion from monocular video with dynamic camera
34
  | Model | Checkpoint | Body Model | Joints | Config | Code |
35
  |---|---|---|---|---|---|
36
  | GEM-SOMA | `gem_soma.ckpt` | SOMA | 77 (body + hands + face) | `config.json` | [GEM-X](https://github.com/NVlabs/GEM-X) |
37
- | GEM-SMPL | `gem_smpl.ckpt` | SMPLx | 17 (body) | `gem_smpl_config.json` | [GEM-SMPL](https://github.com/NVlabs/GEM-SMPL) |
38
 
39
  ---
40
 
@@ -113,8 +113,8 @@ weights = torch.load(path, weights_only=False)
113
 
114
  | Property | Value |
115
  |---|---|
116
- | Architecture | 12-layer Transformer encoder (RoPE, 512 latent dim, 8 heads) |
117
- | Body model | SMPLx (17 joints, body only) |
118
  | Feature space | gvhmr, 151-dim |
119
  | Input | RGB video + 2D keypoints + bounding box + camera intrinsics (+ optional text/audio) |
120
  | Output | Per-frame SMPL body parameters (pose, shape, translation) |
 
19
  GEM is a family of Generalist Human Motion models developed by NVIDIA. This repository hosts two model variants:
20
 
21
  - **GEM-SOMA** — Full-body 77-joint pose (body + hands + face) using the [SOMA](https://research.nvidia.com/labs/dair/gem/) body model
22
+ - **GEM-SMPL** — 17-joint body pose using the SMPL body model, with support for text/audio/music conditioning
23
 
24
  Both models reconstruct 3D human motion from monocular video with dynamic cameras, recovering both camera-space and global motion trajectories.
25
 
 
34
  | Model | Checkpoint | Body Model | Joints | Config | Code |
35
  |---|---|---|---|---|---|
36
  | GEM-SOMA | `gem_soma.ckpt` | SOMA | 77 (body + hands + face) | `config.json` | [GEM-X](https://github.com/NVlabs/GEM-X) |
37
+ | GEM-SMPL | `gem_smpl.ckpt` | SMPL | 17 (body) | `gem_smpl_config.json` | [GEM-SMPL](https://github.com/NVlabs/GEM-SMPL) |
38
 
39
  ---
40
 
 
113
 
114
  | Property | Value |
115
  |---|---|
116
+ | Architecture | 16-layer Transformer encoder (RoPE, 1024 latent dim, 8 heads) |
117
+ | Body model | SMPL (17 joints, body only) |
118
  | Feature space | gvhmr, 151-dim |
119
  | Input | RGB video + 2D keypoints + bounding box + camera intrinsics (+ optional text/audio) |
120
  | Output | Per-frame SMPL body parameters (pose, shape, translation) |