Model card: document only the sources actually used

3ab611e verified 2 days ago

4.78 kB

	---
	license: cc-by-4.0
	tags:
	- mesh-segmentation
	- part-segmentation
	- point-cloud
	- onnx
	- 3d
	- qtmesheditor
	library_name: onnx
	pipeline_tag: other
	---

	# QtMeshEditor — Mesh Part Segmentation

	A point-cloud part-segmentation network (PointNet++-style) that predicts a
	semantic body-part label (head / torso / left+right arm / left+right leg) per
	point, exported to ONNX for in-app inference via ONNX Runtime.

	Built for [QtMeshEditor](https://github.com/fernandotonon/QtMeshEditor) (issue #410) — a free,
	open-source 3D mesh & animation editor — and its companion QtMesh Cloud
	asset service. The app downloads this model on first use and runs it locally
	(offline) to power the Edit-Mode "Select by Part (AI)" action, the
	`qtmesh segment` CLI, and the `segment_mesh` MCP tool. Labels also feed the
	auto-rigger as placement priors.

	## Model

	- Input: a sampled point cloud `float32 [1, N, 3]` (normalised to a centred
	unit box; +Y up, character facing +Z, left limbs at +X).
	- Output: per-point class logits over 7 labels
	(`unknown, head, torso, left_arm, right_arm, left_leg, right_leg`); argmax →
	label, scattered back to mesh vertices/faces by nearest sampled point.
	- Architecture (v2): shared per-point MLP + two kNN local-aggregation
	blocks (in-graph `cdist`+`topk`, ONNX-exportable) + a global max-pooled
	feature; ~0.78 MB. Trained at the app's inference sample size (4096 points).
	- When the model is unavailable, QtMeshEditor falls back to a deterministic
	geometric segmenter (connected-component islands + an up-axis/lateral
	spatial heuristic, refined by skeleton-bone proximity when the mesh is rigged).

	## Training data & license

	Trained from scratch on two permissive sources:

	1. Synthetic surface-sampled bodies (CC0, ours): three parametric body
	plans — humanoid (normal / chibi / lanky proportion regimes, ears/muzzle
	head bumps, posed arms, feet pointing +Z), quadruped (all four legs
	labelled by side, tail → torso), and biped-with-tail (dino) — with exact
	by-construction labels and randomised per-part point density (real
	characters put 30–50% of their vertices in the head/face).
	2. CC0 rigged characters (Quaternius packs — humanoids, farm animals,
	dinosaurs, monsters) mined for exact rig-derived labels (bone weights →
	bone name → part) via `qtmesh segment --dump-training-data`. Each mined
	cloud is canonicalised into the training frame from its own labels and its
	arm/leg sides are reassigned geometrically. A provenance ledger of every
	pack is kept with the training corpus.

	Weights released under CC-BY-4.0; please credit QtMeshEditor.

	## Evaluation (v2)

	Measured per-vertex against exact rig-derived ground truth (bone-weight
	labels), replicating the app's inference path (4096-point sample, nearest-point
	scatter):

	\| eval set \| v1.1.0 \| v2.0.0 \|
	\|---\|---\|---\|
	\| out-of-distribution rigged test characters (held out of training entirely) \| 31.5% \| 94.7% \|
	\| held-out CC0 rigs (suit, dress, sheep, velociraptor) \| — \| 97.0% \|
	\| held-out synthetic bodies \| — \| 95.6% \|

	Per-part recall on the out-of-distribution set (v2): head 0.99, torso 0.80, left arm 0.99,
	right arm 1.00, left leg 0.87, right leg 0.96. Torso is lowest by construction:
	the ground truth comes from skinning weights, whose shoulder/hip boundaries
	don't coincide exactly with any geometric boundary.

	## Reproducing

	`scripts/export-meshseg-onnx.py` in the QtMeshEditor repo (one-time, offline;
	not shipped with the app). The full v1 failure analysis, data pipeline, and
	multi-category roadmap live in the repo's
	`docs/MESH_SEGMENTATION_STRATEGY.md`.

	## Versions

	- v2.0.0 (current) — full data-pipeline rebuild fixing v1's real-mesh
	failures: surface sampling instead of volumetric primitive fills,
	connected bodies, chibi/cartoon proportions (v1 collapsed
	big-headed characters to all-torso), randomised per-part vertex density,
	correct left/right handedness (v1's training data was mirrored vs the
	rig convention), quadruped + biped-tail body plans, mined CC0 rigs with
	label-driven canonicalisation, a deeper two-block kNN network, and
	fine-tuning at the app's 4096-point inference size (v1 trained at 1024).
	31.5% → 94.7% on rig-truth eval. ~0.78 MB.
	- v1.1.0 — retrained with T/A-pose arms, upright + yaw-only augmentation,
	head-protrusion samples, wider proportions, and a PointNet++-style local kNN
	feature. Clean on synthetic humanoids but poor transfer to real meshes
	(volumetric sampling, flipped handedness). ~0.58 MB.
	- v1.0.0 — initial flat-PointNet model on synthetic primitive humanoids
	(arms-down only; over-assigned lateral protrusions to "arm").