| --- |
| license: apache-2.0 |
| tags: |
| - object-detection |
| - keypoint-detection |
| - pose-estimation |
| - onnx |
| - dinov2 |
| - vitpose |
| - rtmdet |
| - mascot |
| - chibi |
| - kemono |
| library_name: onnx |
| base_model: |
| - facebook/dinov2-large |
| inference: false |
| --- |
| |
| # mascot-pose-detect |
|
|
| Two-stage mascot pose detector for chibi, kemono, and other stylized mascot characters. |
|
|
| This repository provides ONNX artifacts for portable inference: |
|
|
| 1. Stage 1: a 7-class RTMDet-tiny bounding-box detector. |
| 2. Stage 2: a DINOv2-Large backbone with a ViTPose-style COCO-17 heatmap head. |
|
|
| The keypoint model is fine-tuned for stylized mascot bodies whose proportions differ strongly from real human pose datasets. |
| The consumer should lift the COCO-17 keypoints to DWPose-25 / POSE_KEYPOINT format and derive toe points from the foot bounding boxes. |
| Hand keypoints are expected to be generated by a separate hand-template fitter when required. |
| |
| ## License |
| |
| This model package is released under the Apache License 2.0. |
| |
| The Stage 2 keypoint model is based on `facebook/dinov2-large`, which is also released under Apache 2.0. |
| It does not use the older MAE-pretrained ViTPose-L checkpoint that constrained the previous test bundle to non-commercial use. |
| |
| The training annotations and source images are not included in this repository. |
| |
| ## Repository Contents |
| |
| ```text |
| grmchn/mascot-pose-detect/ |
| βββ bbox/ |
| β βββ model.onnx |
| β βββ classes.json |
| β βββ decode_params.json |
| βββ keypoint/ |
| βββ dinov2_vitpose_l/ |
| β βββ model.onnx |
| β βββ meta.json |
| βββ dinov2_vitpose_l_v2/ |
| βββ model.onnx |
| βββ meta.json |
| ``` |
| |
| ## Stage 1: BBox Detector |
|
|
| The bbox model detects seven mascot body regions: |
|
|
| | index | name | |
| |------:|------| |
| | 0 | `full` | |
| | 1 | `head` | |
| | 2 | `body` | |
| | 3 | `hand_left` | |
| | 4 | `hand_right` | |
| | 5 | `foot_left` | |
| | 6 | `foot_right` | |
|
|
| Left and right follow anatomical / character-view naming. |
| For a front-facing character, the character's right side usually appears on the screen-left side. |
|
|
| ## Stage 2: Keypoint Detector |
|
|
| The keypoint model input is a top-down crop from the Stage 1 `full` or `body` bbox. |
|
|
| ## Model Versions |
|
|
| | version | keypoint path | training run | status | |
| |---|---|---|---| |
| | v1 | `keypoint/dinov2_vitpose_l/` | `general_filtered` | Stable baseline release | |
| | v2 | `keypoint/dinov2_vitpose_l_v2/` | `final_v3_from_final_v2` | Updated model with additional hard-example training | |
|
|
| Both versions use the same architecture, input shape, output heatmap shape, and post-processing contract. |
| Switching from v1 to v2 only requires changing the keypoint variant path from `dinov2_vitpose_l` to `dinov2_vitpose_l_v2`. |
|
|
| | field | value | |
| |---|---| |
| | Architecture | `dinov2_vitpose_l` | |
| | Backbone | `facebook/dinov2-large` | |
| | Input | `1x3x224x168` NCHW RGB, ImageNet-normalized | |
| | Output | `heatmap` | |
| | Keypoint layout | COCO-17 | |
| | Post-process layout | DWPose-25 / POSE_KEYPOINT-compatible | |
| |
| See each version's `meta.json` for exact input size, normalization values, output names, and post-processing notes. |
| |
| ## Download |
| |
| ```python |
| from huggingface_hub import snapshot_download |
| |
| local_dir = snapshot_download( |
| repo_id="grmchn/mascot-pose-detect", |
| allow_patterns=[ |
| "bbox/*", |
| "keypoint/dinov2_vitpose_l_v2/*", |
| ], |
| ) |
| ``` |
| |
| ## Notes |
|
|
| This is not an OpenPose implementation and does not include OpenPose weights. |
| It produces keypoints that can be converted into an OpenPose-compatible JSON schema for downstream tools. |
|
|
| The model was trained for stylized mascot characters. |
| It may not generalize to realistic human photos without additional fine-tuning. |
|
|