fernandotonon's picture
Model card: document only the sources actually used
3ab611e verified
|
Raw
History Blame Contribute Delete
4.78 kB
---
license: cc-by-4.0
tags:
- mesh-segmentation
- part-segmentation
- point-cloud
- onnx
- 3d
- qtmesheditor
library_name: onnx
pipeline_tag: other
---
# QtMeshEditor β€” Mesh Part Segmentation
A point-cloud part-segmentation network (PointNet++-style) that predicts a
semantic body-part label (head / torso / left+right arm / left+right leg) per
point, exported to **ONNX** for in-app inference via ONNX Runtime.
Built for **[QtMeshEditor](https://github.com/fernandotonon/QtMeshEditor)** (issue #410) β€” a free,
open-source 3D mesh & animation editor β€” and its companion **QtMesh Cloud**
asset service. The app downloads this model on first use and runs it locally
(offline) to power the Edit-Mode **"Select by Part (AI)"** action, the
`qtmesh segment` CLI, and the `segment_mesh` MCP tool. Labels also feed the
auto-rigger as placement priors.
## Model
- **Input:** a sampled point cloud `float32 [1, N, 3]` (normalised to a centred
unit box; +Y up, character facing +Z, left limbs at +X).
- **Output:** per-point class logits over 7 labels
(`unknown, head, torso, left_arm, right_arm, left_leg, right_leg`); argmax β†’
label, scattered back to mesh vertices/faces by nearest sampled point.
- **Architecture (v2):** shared per-point MLP + two kNN local-aggregation
blocks (in-graph `cdist`+`topk`, ONNX-exportable) + a global max-pooled
feature; ~0.78 MB. Trained at the app's inference sample size (4096 points).
- When the model is unavailable, QtMeshEditor falls back to a deterministic
**geometric** segmenter (connected-component islands + an up-axis/lateral
spatial heuristic, refined by skeleton-bone proximity when the mesh is rigged).
## Training data & license
Trained **from scratch** on two permissive sources:
1. **Synthetic surface-sampled bodies (CC0, ours):** three parametric body
plans β€” humanoid (normal / chibi / lanky proportion regimes, ears/muzzle
head bumps, posed arms, feet pointing +Z), quadruped (all four legs
labelled by side, tail β†’ torso), and biped-with-tail (dino) β€” with exact
by-construction labels and randomised per-part point density (real
characters put 30–50% of their vertices in the head/face).
2. **CC0 rigged characters** (Quaternius packs β€” humanoids, farm animals,
dinosaurs, monsters) mined for **exact rig-derived labels** (bone weights β†’
bone name β†’ part) via `qtmesh segment --dump-training-data`. Each mined
cloud is canonicalised into the training frame from its own labels and its
arm/leg sides are reassigned geometrically. A provenance ledger of every
pack is kept with the training corpus.
Weights released under **CC-BY-4.0**; please credit *QtMeshEditor*.
## Evaluation (v2)
Measured per-vertex against **exact rig-derived ground truth** (bone-weight
labels), replicating the app's inference path (4096-point sample, nearest-point
scatter):
| eval set | v1.1.0 | v2.0.0 |
|---|---|---|
| out-of-distribution rigged test characters (held out of training entirely) | 31.5% | **94.7%** |
| held-out CC0 rigs (suit, dress, sheep, velociraptor) | β€” | 97.0% |
| held-out synthetic bodies | β€” | 95.6% |
Per-part recall on the out-of-distribution set (v2): head 0.99, torso 0.80, left arm 0.99,
right arm 1.00, left leg 0.87, right leg 0.96. Torso is lowest by construction:
the ground truth comes from skinning weights, whose shoulder/hip boundaries
don't coincide exactly with any geometric boundary.
## Reproducing
`scripts/export-meshseg-onnx.py` in the QtMeshEditor repo (one-time, offline;
not shipped with the app). The full v1 failure analysis, data pipeline, and
multi-category roadmap live in the repo's
`docs/MESH_SEGMENTATION_STRATEGY.md`.
## Versions
- **v2.0.0** (current) β€” full data-pipeline rebuild fixing v1's real-mesh
failures: **surface** sampling instead of volumetric primitive fills,
**connected** bodies, **chibi/cartoon proportions** (v1 collapsed
big-headed characters to all-torso), randomised per-part vertex density,
**correct left/right handedness** (v1's training data was mirrored vs the
rig convention), quadruped + biped-tail body plans, mined CC0 rigs with
label-driven canonicalisation, a deeper two-block kNN network, and
fine-tuning at the app's 4096-point inference size (v1 trained at 1024).
31.5% β†’ **94.7%** on rig-truth eval. ~0.78 MB.
- **v1.1.0** β€” retrained with T/A-pose arms, upright + yaw-only augmentation,
head-protrusion samples, wider proportions, and a PointNet++-style local kNN
feature. Clean on synthetic humanoids but poor transfer to real meshes
(volumetric sampling, flipped handedness). ~0.58 MB.
- **v1.0.0** β€” initial flat-PointNet model on synthetic primitive humanoids
(arms-down only; over-assigned lateral protrusions to "arm").