| --- |
| license: cc-by-4.0 |
| tags: |
| - mesh-segmentation |
| - part-segmentation |
| - point-cloud |
| - onnx |
| - 3d |
| - qtmesheditor |
| library_name: onnx |
| pipeline_tag: other |
| --- |
| |
| # QtMeshEditor β Mesh Part Segmentation |
|
|
| A point-cloud part-segmentation network (PointNet++-style) that predicts a |
| semantic body-part label (head / torso / left+right arm / left+right leg) per |
| point, exported to **ONNX** for in-app inference via ONNX Runtime. |
|
|
| Built for **[QtMeshEditor](https://github.com/fernandotonon/QtMeshEditor)** (issue #410) β a free, |
| open-source 3D mesh & animation editor β and its companion **QtMesh Cloud** |
| asset service. The app downloads this model on first use and runs it locally |
| (offline) to power the Edit-Mode **"Select by Part (AI)"** action, the |
| `qtmesh segment` CLI, and the `segment_mesh` MCP tool. Labels also feed the |
| auto-rigger as placement priors. |
|
|
| ## Model |
|
|
| - **Input:** a sampled point cloud `float32 [1, N, 3]` (normalised to a centred |
| unit box; +Y up, character facing +Z, left limbs at +X). |
| - **Output:** per-point class logits over 7 labels |
| (`unknown, head, torso, left_arm, right_arm, left_leg, right_leg`); argmax β |
| label, scattered back to mesh vertices/faces by nearest sampled point. |
| - **Architecture (v2):** shared per-point MLP + two kNN local-aggregation |
| blocks (in-graph `cdist`+`topk`, ONNX-exportable) + a global max-pooled |
| feature; ~0.78 MB. Trained at the app's inference sample size (4096 points). |
| - When the model is unavailable, QtMeshEditor falls back to a deterministic |
| **geometric** segmenter (connected-component islands + an up-axis/lateral |
| spatial heuristic, refined by skeleton-bone proximity when the mesh is rigged). |
|
|
| ## Training data & license |
|
|
| Trained **from scratch** on two permissive sources: |
|
|
| 1. **Synthetic surface-sampled bodies (CC0, ours):** three parametric body |
| plans β humanoid (normal / chibi / lanky proportion regimes, ears/muzzle |
| head bumps, posed arms, feet pointing +Z), quadruped (all four legs |
| labelled by side, tail β torso), and biped-with-tail (dino) β with exact |
| by-construction labels and randomised per-part point density (real |
| characters put 30β50% of their vertices in the head/face). |
| 2. **CC0 rigged characters** (Quaternius packs β humanoids, farm animals, |
| dinosaurs, monsters) mined for **exact rig-derived labels** (bone weights β |
| bone name β part) via `qtmesh segment --dump-training-data`. Each mined |
| cloud is canonicalised into the training frame from its own labels and its |
| arm/leg sides are reassigned geometrically. A provenance ledger of every |
| pack is kept with the training corpus. |
|
|
| Weights released under **CC-BY-4.0**; please credit *QtMeshEditor*. |
|
|
| ## Evaluation (v2) |
|
|
| Measured per-vertex against **exact rig-derived ground truth** (bone-weight |
| labels), replicating the app's inference path (4096-point sample, nearest-point |
| scatter): |
|
|
| | eval set | v1.1.0 | v2.0.0 | |
| |---|---|---| |
| | out-of-distribution rigged test characters (held out of training entirely) | 31.5% | **94.7%** | |
| | held-out CC0 rigs (suit, dress, sheep, velociraptor) | β | 97.0% | |
| | held-out synthetic bodies | β | 95.6% | |
|
|
| Per-part recall on the out-of-distribution set (v2): head 0.99, torso 0.80, left arm 0.99, |
| right arm 1.00, left leg 0.87, right leg 0.96. Torso is lowest by construction: |
| the ground truth comes from skinning weights, whose shoulder/hip boundaries |
| don't coincide exactly with any geometric boundary. |
|
|
| ## Reproducing |
|
|
| `scripts/export-meshseg-onnx.py` in the QtMeshEditor repo (one-time, offline; |
| not shipped with the app). The full v1 failure analysis, data pipeline, and |
| multi-category roadmap live in the repo's |
| `docs/MESH_SEGMENTATION_STRATEGY.md`. |
|
|
| ## Versions |
|
|
| - **v2.0.0** (current) β full data-pipeline rebuild fixing v1's real-mesh |
| failures: **surface** sampling instead of volumetric primitive fills, |
| **connected** bodies, **chibi/cartoon proportions** (v1 collapsed |
| big-headed characters to all-torso), randomised per-part vertex density, |
| **correct left/right handedness** (v1's training data was mirrored vs the |
| rig convention), quadruped + biped-tail body plans, mined CC0 rigs with |
| label-driven canonicalisation, a deeper two-block kNN network, and |
| fine-tuning at the app's 4096-point inference size (v1 trained at 1024). |
| 31.5% β **94.7%** on rig-truth eval. ~0.78 MB. |
| - **v1.1.0** β retrained with T/A-pose arms, upright + yaw-only augmentation, |
| head-protrusion samples, wider proportions, and a PointNet++-style local kNN |
| feature. Clean on synthetic humanoids but poor transfer to real meshes |
| (volumetric sampling, flipped handedness). ~0.58 MB. |
| - **v1.0.0** β initial flat-PointNet model on synthetic primitive humanoids |
| (arms-down only; over-assigned lateral protrusions to "arm"). |
|
|