Upload mascot-pose-detect ONNX artifacts

42d844f verified 6 days ago

3.68 kB

	---
	license: apache-2.0
	tags:
	- object-detection
	- keypoint-detection
	- pose-estimation
	- onnx
	- dinov2
	- vitpose
	- rtmdet
	- mascot
	- chibi
	- kemono
	library_name: onnx
	base_model:
	- facebook/dinov2-large
	inference: false
	---

	# mascot-pose-detect

	Two-stage mascot pose detector for chibi, kemono, and other stylized mascot characters.

	This repository provides ONNX artifacts for portable inference:

	1. Stage 1: a 7-class RTMDet-tiny bounding-box detector.
	2. Stage 2: a DINOv2-Large backbone with a ViTPose-style COCO-17 heatmap head.

	The keypoint model is fine-tuned for stylized mascot bodies whose proportions differ strongly from real human pose datasets.
	The consumer should lift the COCO-17 keypoints to DWPose-25 / POSE_KEYPOINT format and derive toe points from the foot bounding boxes.
	Hand keypoints are expected to be generated by a separate hand-template fitter when required.

	## License

	This model package is released under the Apache License 2.0.

	The Stage 2 keypoint model is based on `facebook/dinov2-large`, which is also released under Apache 2.0.
	It does not use the older MAE-pretrained ViTPose-L checkpoint that constrained the previous test bundle to non-commercial use.

	The training annotations and source images are not included in this repository.

	## Repository Contents

	```text
	grmchn/mascot-pose-detect/
	├── bbox/
	│ ├── model.onnx
	│ ├── classes.json
	│ └── decode_params.json
	└── keypoint/
	├── dinov2_vitpose_l/
	│ ├── model.onnx
	│ └── meta.json
	└── dinov2_vitpose_l_v2/
	├── model.onnx
	└── meta.json
	```

	## Stage 1: BBox Detector

	The bbox model detects seven mascot body regions:

	\| index \| name \|
	\|------:\|------\|
	\| 0 \| `full` \|
	\| 1 \| `head` \|
	\| 2 \| `body` \|
	\| 3 \| `hand_left` \|
	\| 4 \| `hand_right` \|
	\| 5 \| `foot_left` \|
	\| 6 \| `foot_right` \|

	Left and right follow anatomical / character-view naming.
	For a front-facing character, the character's right side usually appears on the screen-left side.

	## Stage 2: Keypoint Detector

	The keypoint model input is a top-down crop from the Stage 1 `full` or `body` bbox.

	## Model Versions

	\| version \| keypoint path \| training run \| status \|
	\|---\|---\|---\|---\|
	\| v1 \| `keypoint/dinov2_vitpose_l/` \| `general_filtered` \| Stable baseline release \|
	\| v2 \| `keypoint/dinov2_vitpose_l_v2/` \| `final_v3_from_final_v2` \| Updated model with additional hard-example training \|

	Both versions use the same architecture, input shape, output heatmap shape, and post-processing contract.
	Switching from v1 to v2 only requires changing the keypoint variant path from `dinov2_vitpose_l` to `dinov2_vitpose_l_v2`.

	\| field \| value \|
	\|---\|---\|
	\| Architecture \| `dinov2_vitpose_l` \|
	\| Backbone \| `facebook/dinov2-large` \|
	\| Input \| `1x3x224x168` NCHW RGB, ImageNet-normalized \|
	\| Output \| `heatmap` \|
	\| Keypoint layout \| COCO-17 \|
	\| Post-process layout \| DWPose-25 / POSE_KEYPOINT-compatible \|

	See each version's `meta.json` for exact input size, normalization values, output names, and post-processing notes.

	## Download

	```python
	from huggingface_hub import snapshot_download

	local_dir = snapshot_download(
	repo_id="grmchn/mascot-pose-detect",
	allow_patterns=[
	"bbox/*",
	"keypoint/dinov2_vitpose_l_v2/*",
	],
	)
	```

	## Notes

	This is not an OpenPose implementation and does not include OpenPose weights.
	It produces keypoints that can be converted into an OpenPose-compatible JSON schema for downstream tools.

	The model was trained for stylized mascot characters.
	It may not generalize to realistic human photos without additional fine-tuning.