Instructions to use AgenticVibes/hy3dmlx-shape-v2.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use AgenticVibes/hy3dmlx-shape-v2.1 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir hy3dmlx-shape-v2.1 AgenticVibes/hy3dmlx-shape-v2.1
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Hunyuan3D 2.1 — Shape Pipeline (MLX)
MLX-native weights for the shape pipeline of tencent/Hunyuan3D-2.1 — single image in, untextured mesh out. Optimised for Apple Silicon Macs.
These are the artefacts produced by the hy3dmlx educational arc (chapters 13 + 14): a chapter-by-chapter port of the Hunyuan3D 2.1 shape pipeline from PyTorch+CUDA to Apple's MLX. The conversion preserves trained parameters bit-for-bit; only the on-disk layout changes.
Out of scope: the paint pipeline (PBR texturing) of Hunyuan3D 2.1 is not in this repo. For the MLX paint weights, see AgenticVibes/hunyuan3d-2.1-mlx.
Files
| File | Size | Description |
|---|---|---|
dit.safetensors |
~6.10 GB | HunYuanDiT-Plain backbone — 21 layers, hidden 2048, 16 heads, last 6 layers MoE (8 experts, top-2). fp16. ~3.3B params. |
shape_vae_encoder.safetensors |
~227 MB | Point-cross-attention encoder — 8 self-attn layers, width 1024, 16 heads. fp16. |
shape_vae_decoder.safetensors |
~429 MB | Latent-refine transformer (16 layers, width 1024) + cross-attention geo-decoder. fp16. |
config.json |
1 KB | Architectural metadata + module configs + upstream commit pin. |
LICENSE |
— | Tencent Hunyuan 3D 2.1 Community License Agreement (verbatim from upstream). |
NOTICE |
— | Required attribution + change description. |
DINOv2 image encoder is not included. The pipeline's image conditioner is DINOv2-Large, a separately-licensed Meta model. hy3dmlx.hub.load_pipeline_from_hf pulls it from the upstream facebook/dinov2-large repo automatically — under its own Apache-2.0 license, distinct from this repo's terms.
Usage
Install hy3dmlx and call load_pipeline_from_hf:
from hy3dmlx.hub import load_pipeline_from_hf
from PIL import Image
pipeline = load_pipeline_from_hf("AgenticVibes/hy3dmlx-shape-v2.1")
result = pipeline(image=Image.open("input.png"))
result.mesh.export("output.glb")
That single call:
- Downloads
dit.safetensors,shape_vae_encoder.safetensors,shape_vae_decoder.safetensors, andconfig.jsonfrom this repo (cached on subsequent calls). - Pulls DINOv2-Large from
facebook/dinov2-large. - Builds and wires up
ShapeMeshPipeline(sampling + decode + marching cubes) ready to call.
If you'd rather load components independently — e.g. to swap the scheduler or run latent-only experiments — see hy3dmlx.hub.fetch_weights to download the snapshot directory and hy3dmlx.dit / hy3dmlx.vae for module-level loaders.
Performance — measured
End-to-end image-to-mesh on the chapter-1 demo input, measured on M3 Ultra-class hardware, fp16, 50 inference steps with classifier-free guidance scale 5.0, octree resolution 384 (matching the upstream defaults):
| Stage | Wall (s) | Share |
|---|---|---|
| DINOv2 forward | 0.2 | 0% |
| DiT sampling (50 steps × CFG batch=2) | 646 | 58% |
| VAE refine (post_kl + 16-layer transformer) | 0.4 | 0% |
| VAE grid eval (~7000 chunks at 8000 voxels) | 465 | 42% |
| Total | 1112 (~18.5 min) |
MLX peak memory: 7.85 GB. Output mesh: 343 508 vertices, 687 024 faces.
A faster development setting at octree_resolution=256 finishes in ~5 minutes with comparable global geometry.
Mesh-level parity vs the PyTorch reference
Comparing this repo's MLX inference output against the upstream PyTorch reference run on the same input:
| Metric | Value | Informal threshold |
|---|---|---|
| Chamfer L2 (symmetric) | 0.0244 | ≤ 0.02 (just over — see below) |
| F1 @ 0.01 | 0.190 | ≥ 0.85 (low — see below) |
| F1 @ 0.05 | 0.891 | — |
| Normal consistency | 0.915 | ≥ 0.90 ✓ |
| Bounding-box extent (MLX vs ref) | [1.276, 1.999, 1.181] vs [1.285, 1.993, 1.161] |
within 1% on every axis |
The two meshes are not bit-equal, and they aren't expected to be. Differences come from three places:
- RNG mismatch. The reference run used non-deterministic noise (
generator=None); our MLX run uses a seeded MLX PRNG. PyTorch's and MLX's standard-normal samplers don't agree, so even with bit-equal weights the initial latent differs. - fp16 cumulative error. 50 scheduler steps × CFG arithmetic × 21 DiT blocks × MoE routing accumulates a small wedge in fp16 that fp32 wouldn't have.
- Marching-cubes is non-linear. A small change in the occupancy field at a voxel boundary can flip whether that voxel becomes a triangle, producing localised mesh-level differences that don't reflect a weight or architecture issue.
For full discussion (including the discovery that our MLX run produces a better "Y" on the demo sign than the reference does), see chapter 13 of the source repo: https://github.com/AgenticVibes/hy3dmlx/blob/main/chapters/13-end-to-end/README.md.
If you need bit-near parity for a specific input, pre-sample the noise as a numpy array and feed it via initial_noise=. That removes the RNG-mismatch term but won't remove the fp16 + marching-cubes terms.
Inputs and limitations
- Image format. Anything PIL reads. PNG with transparency is preferred (skips background removal entirely; the alpha channel directly defines the subject).
- Image size. Sweet spot 512–1024 px on the long side. Smaller gets heavily upsampled and loses detail; larger has its detail thrown away.
- Subject content. Trained on 3D-asset-grade imagery — characters, props, vehicles. Out-of-distribution inputs (faces, architecture, complex multi-subject scenes) produce uneven results.
- Subject pose. Front-on views often produce flat or hollow backs; 3/4 views work better (the demo input is one).
- Memory.
8 GB MLX peak at fp16 with3M voxels instead of 56M) to fit comfortably.octree_resolution=384. Smaller M-series Macs may needoctree_resolution=256( - No quantization. All weights ship at fp16 native to the upstream checkpoint. Lower-precision (Q8/Q4) variants could be future work; not in scope here.
Citations
If you use these weights or the hy3dmlx port in research or applications, please cite the upstream Hunyuan3D papers:
@article{hunyuan3d2025_2_1,
title={Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material},
author={Tencent Hunyuan3D Team},
year={2025},
eprint={2506.15442},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.15442}
}
@article{hunyuan3d2025,
title={Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation},
author={Tencent Hunyuan3D Team},
year={2025},
eprint={2501.12202},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.12202}
}
The shape pipeline relies on several earlier works that the Hunyuan3D papers themselves cite — please consult the upstream papers for the full bibliography. Briefly:
- DINOv2 (Oquab et al., 2023) — the image conditioner. https://arxiv.org/abs/2304.07193
- Flow Matching (Lipman et al., 2022) — the diffusion-time formulation. https://arxiv.org/abs/2210.02747
- Scalable Diffusion Models with Transformers (Peebles & Xie, 2022) — the DiT architecture. https://arxiv.org/abs/2212.09748
License and attribution
These weights are Model Derivatives under the Tencent Hunyuan 3D 2.1 Community License Agreement. See NOTICE for the change description.
Key terms:
- Territory restricted. The license expressly excludes the European Union, United Kingdom, and South Korea. This HuggingFace repo is gated (
extra_gated_eu_disallowed: true) to disallow EU downloads at the platform level. - Commercial-use threshold. If your product or service has more than 1 million monthly active users at the version-release date, the upstream license requires a separate commercial license from Tencent (see Section 4 of the LICENSE file).
- Improvement-of-other-models prohibition. You may not use these weights or their outputs to improve any AI model other than Hunyuan3D 2.1 or its derivatives.
- Attribution required. Distributions to third parties must accompany this
NOTICEfile (Section 3(d) of the upstream license).
The hy3dmlx package itself (the loading code) is independent of this artefact's license and is released separately at https://github.com/AgenticVibes/hy3dmlx.
DINOv2 weights, when fetched via load_pipeline_from_hf, are governed by their own Apache-2.0 license at facebook/dinov2-large — entirely independent of the terms here.
Acknowledgements
These weights would not exist without the Tencent Hunyuan3D team's decision to open the original model under the Community License. Conversion preserves their work bit-for-bit; everything genuinely interesting in the trained behaviour is theirs. Powered by Tencent Hunyuan.
- Downloads last month
- 140
Quantized
Model tree for AgenticVibes/hy3dmlx-shape-v2.1
Base model
tencent/Hunyuan3D-2.1