File size: 5,273 Bytes
3c426f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1333f57
3c426f1
 
 
 
 
1333f57
3c426f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1333f57
3c426f1
 
 
 
 
 
 
 
 
 
 
 
 
1333f57
3c426f1
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
license: apache-2.0
tags:
- pose-estimation
- 3d-pose
- computer-vision
- pytorch
- rtmpose
datasets:
- cocktail14
metrics:
- mpjpe
library_name: pytorch
---

# RTMPose3D

Real-time multi-person 3D whole-body pose estimation with 133 keypoints per person.

## Model Description

RTMPose3D is a real-time 3D pose estimation model that detects and tracks 133 keypoints per person:
- **17** body keypoints (COCO format)
- **6** foot keypoints  
- **68** facial landmarks
- **42** hand keypoints (21 per hand)

The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.

## Model Variants

This repository contains checkpoints for:

| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
|-------|------------|-------|------------------|-----------------|
| RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` |
| RTMW3D-L (Large) | ~65M | Real-time | 0.678 | `rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth` |
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.680 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` |

The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.

## Model Variants

This repository contains checkpoints for:

| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
|-------|------------|-------|------------------|-----------------|
| RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` |
| RTMW3D-L (Large) | ~65M | Real-time | 0.045 | `rtmw3d-l_cock14-0d4ad840_20240422.pth` |
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.057 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` |

## Installation

```bash
pip install git+https://github.com/b-arac/rtmpose3d.git
```

Or clone and install locally:

```bash
git clone https://github.com/b-arac/rtmpose3d.git
cd rtmpose3d
pip install -r requirements.txt
pip install -e .
```

## Quick Start

### Using the HuggingFace Transformers-style API

```python
import cv2
from rtmpose3d import RTMPose3D

# Initialize model (auto-downloads checkpoints from this repo)
model = RTMPose3D.from_pretrained('rbarac/rtmpose3d', device='cuda:0')

# Run inference
image = cv2.imread('person.jpg')
results = model(image, return_tensors='np')

# Access results
keypoints_3d = results['keypoints_3d']  # [N, 133, 3] - 3D coords in meters
keypoints_2d = results['keypoints_2d']  # [N, 133, 2] - pixel coords
scores = results['scores']              # [N, 133] - confidence [0, 1]
```

### Using the Simple Inference API

```python
from rtmpose3d import RTMPose3DInference

# Initialize with model size
model = RTMPose3DInference(model_size='l', device='cuda:0')  # or 'x' for extra large

# Run inference
results = model(image)
print(results['keypoints_3d'].shape)  # [N, 133, 3]
```

### Single Person Detection

Detect only the most prominent person in the image:

```python
# Works with both APIs
results = model(image, single_person=True)  # Returns only N=1
```

## Output Format

```python
{
    'keypoints_3d': np.ndarray,  # [N, 133, 3] - (X, Y, Z) in meters
    'keypoints_2d': np.ndarray,  # [N, 133, 2] - (x, y) pixel coordinates
    'scores': np.ndarray,        # [N, 133] - confidence scores [0, 1]
    'bboxes': np.ndarray         # [N, 4] - bounding boxes [x1, y1, x2, y2]
}
```

Where `N` is the number of detected persons.

### Coordinate Systems

**2D Keypoints** - Pixel coordinates:
- X: horizontal position [0, image_width]
- Y: vertical position [0, image_height]

**3D Keypoints** - Camera-relative coordinates in meters (Z-up convention):
- X: horizontal (negative=left, positive=right)
- Y: depth (negative=closer, positive=farther)
- Z: vertical (negative=down, positive=up)

## Keypoint Indices

| Index Range | Body Part | Count | Description |
|-------------|-----------|-------|-------------|
| 0-16 | Body | 17 | Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles |
| 17-22 | Feet | 6 | Foot keypoints |
| 23-90 | Face | 68 | Facial landmarks |
| 91-111 | Left Hand | 21 | Left hand keypoints |
| 112-132 | Right Hand | 21 | Right hand keypoints |

## Training Data

The models were trained on the **Cocktail14** dataset, which combines 14 public 3D pose datasets:
- Human3.6M
- COCO-WholeBody
- UBody
- And 11 more datasets

## Performance

Evaluated on standard 3D pose benchmarks:

- **RTMW3D-L**: 0.045 MPJPE, real-time inference (~30 FPS on RTX 3090)
- **RTMW3D-X**: 0.057 MPJPE, slower but higher accuracy

## Requirements

- Python >= 3.8
- PyTorch >= 2.0.0
- CUDA-capable GPU (4GB+ VRAM recommended)
- mmcv >= 2.0.0
- MMPose >= 1.0.0
- MMDetection >= 3.0.0

## Citation

```bibtex
@misc{rtmpose3d2025,
  title={RTMPose3D: Real-Time Multi-Person 3D Pose Estimation},
  author={Arac, Bahadir},
  year={2025},
  publisher={GitHub},
  url={https://github.com/b-arac/rtmpose3d}
}
```

## License

Apache 2.0

## Acknowledgments

Built on [MMPose](https://github.com/open-mmlab/mmpose) by OpenMMLab. Models trained by the MMPose team on the Cocktail14 dataset.

## Links

- **GitHub Repository**: [b-arac/rtmpose3d](https://github.com/b-arac/rtmpose3d)
- **Documentation**: See README in the repository
- **MMPose**: [open-mmlab/mmpose](https://github.com/open-mmlab/mmpose)