rbarac commited on
Commit
3c426f1
·
verified ·
1 Parent(s): cb43d21

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +192 -0
README.md ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - pose-estimation
5
+ - 3d-pose
6
+ - computer-vision
7
+ - pytorch
8
+ - rtmpose
9
+ datasets:
10
+ - cocktail14
11
+ metrics:
12
+ - mpjpe
13
+ library_name: pytorch
14
+ ---
15
+
16
+ # RTMPose3D
17
+
18
+ Real-time multi-person 3D whole-body pose estimation with 133 keypoints per person.
19
+
20
+ ## Model Description
21
+
22
+ RTMPose3D is a real-time 3D pose estimation model that detects and tracks 133 keypoints per person:
23
+ - **17** body keypoints (COCO format)
24
+ - **6** foot keypoints
25
+ - **68** facial landmarks
26
+ - **42** hand keypoints (21 per hand)
27
+
28
+ The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.
29
+
30
+ ## Model Variants
31
+
32
+ This repository contains checkpoints for:
33
+
34
+ | Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
35
+ |-------|------------|-------|------------------|-----------------|
36
+ | RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` |
37
+ | RTMW3D-L (Large) | ~65M | Real-time | 0.678 | `rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth` |
38
+ | RTMW3D-X (Extra Large) | ~98M | Slower | 0.680 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` |
39
+
40
+ The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.
41
+
42
+ ## Model Variants
43
+
44
+ This repository contains checkpoints for:
45
+
46
+ | Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
47
+ |-------|------------|-------|------------------|-----------------|
48
+ | RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` |
49
+ | RTMW3D-L (Large) | ~65M | Real-time | 0.045 | `rtmw3d-l_cock14-0d4ad840_20240422.pth` |
50
+ | RTMW3D-X (Extra Large) | ~98M | Slower | 0.057 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` |
51
+
52
+ ## Installation
53
+
54
+ ```bash
55
+ pip install git+https://github.com/mutedeparture/rtmpose3d.git
56
+ ```
57
+
58
+ Or clone and install locally:
59
+
60
+ ```bash
61
+ git clone https://github.com/mutedeparture/rtmpose3d.git
62
+ cd rtmpose3d
63
+ pip install -r requirements.txt
64
+ pip install -e .
65
+ ```
66
+
67
+ ## Quick Start
68
+
69
+ ### Using the HuggingFace Transformers-style API
70
+
71
+ ```python
72
+ import cv2
73
+ from rtmpose3d import RTMPose3D
74
+
75
+ # Initialize model (auto-downloads checkpoints from this repo)
76
+ model = RTMPose3D.from_pretrained('rbarac/rtmpose3d', device='cuda:0')
77
+
78
+ # Run inference
79
+ image = cv2.imread('person.jpg')
80
+ results = model(image, return_tensors='np')
81
+
82
+ # Access results
83
+ keypoints_3d = results['keypoints_3d'] # [N, 133, 3] - 3D coords in meters
84
+ keypoints_2d = results['keypoints_2d'] # [N, 133, 2] - pixel coords
85
+ scores = results['scores'] # [N, 133] - confidence [0, 1]
86
+ ```
87
+
88
+ ### Using the Simple Inference API
89
+
90
+ ```python
91
+ from rtmpose3d import RTMPose3DInference
92
+
93
+ # Initialize with model size
94
+ model = RTMPose3DInference(model_size='l', device='cuda:0') # or 'x' for extra large
95
+
96
+ # Run inference
97
+ results = model(image)
98
+ print(results['keypoints_3d'].shape) # [N, 133, 3]
99
+ ```
100
+
101
+ ### Single Person Detection
102
+
103
+ Detect only the most prominent person in the image:
104
+
105
+ ```python
106
+ # Works with both APIs
107
+ results = model(image, single_person=True) # Returns only N=1
108
+ ```
109
+
110
+ ## Output Format
111
+
112
+ ```python
113
+ {
114
+ 'keypoints_3d': np.ndarray, # [N, 133, 3] - (X, Y, Z) in meters
115
+ 'keypoints_2d': np.ndarray, # [N, 133, 2] - (x, y) pixel coordinates
116
+ 'scores': np.ndarray, # [N, 133] - confidence scores [0, 1]
117
+ 'bboxes': np.ndarray # [N, 4] - bounding boxes [x1, y1, x2, y2]
118
+ }
119
+ ```
120
+
121
+ Where `N` is the number of detected persons.
122
+
123
+ ### Coordinate Systems
124
+
125
+ **2D Keypoints** - Pixel coordinates:
126
+ - X: horizontal position [0, image_width]
127
+ - Y: vertical position [0, image_height]
128
+
129
+ **3D Keypoints** - Camera-relative coordinates in meters (Z-up convention):
130
+ - X: horizontal (negative=left, positive=right)
131
+ - Y: depth (negative=closer, positive=farther)
132
+ - Z: vertical (negative=down, positive=up)
133
+
134
+ ## Keypoint Indices
135
+
136
+ | Index Range | Body Part | Count | Description |
137
+ |-------------|-----------|-------|-------------|
138
+ | 0-16 | Body | 17 | Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles |
139
+ | 17-22 | Feet | 6 | Foot keypoints |
140
+ | 23-90 | Face | 68 | Facial landmarks |
141
+ | 91-111 | Left Hand | 21 | Left hand keypoints |
142
+ | 112-132 | Right Hand | 21 | Right hand keypoints |
143
+
144
+ ## Training Data
145
+
146
+ The models were trained on the **Cocktail14** dataset, which combines 14 public 3D pose datasets:
147
+ - Human3.6M
148
+ - COCO-WholeBody
149
+ - UBody
150
+ - And 11 more datasets
151
+
152
+ ## Performance
153
+
154
+ Evaluated on standard 3D pose benchmarks:
155
+
156
+ - **RTMW3D-L**: 0.045 MPJPE, real-time inference (~30 FPS on RTX 3090)
157
+ - **RTMW3D-X**: 0.057 MPJPE, slower but higher accuracy
158
+
159
+ ## Requirements
160
+
161
+ - Python >= 3.8
162
+ - PyTorch >= 2.0.0
163
+ - CUDA-capable GPU (4GB+ VRAM recommended)
164
+ - mmcv >= 2.0.0
165
+ - MMPose >= 1.0.0
166
+ - MMDetection >= 3.0.0
167
+
168
+ ## Citation
169
+
170
+ ```bibtex
171
+ @misc{rtmpose3d2025,
172
+ title={RTMPose3D: Real-Time Multi-Person 3D Pose Estimation},
173
+ author={Arac, Bahadir},
174
+ year={2025},
175
+ publisher={GitHub},
176
+ url={https://github.com/mutedeparture/rtmpose3d}
177
+ }
178
+ ```
179
+
180
+ ## License
181
+
182
+ Apache 2.0
183
+
184
+ ## Acknowledgments
185
+
186
+ Built on [MMPose](https://github.com/open-mmlab/mmpose) by OpenMMLab. Models trained by the MMPose team on the Cocktail14 dataset.
187
+
188
+ ## Links
189
+
190
+ - **GitHub Repository**: [mutedeparture/rtmpose3d](https://github.com/mutedeparture/rtmpose3d)
191
+ - **Documentation**: See README in the repository
192
+ - **MMPose**: [open-mmlab/mmpose](https://github.com/open-mmlab/mmpose)