| # Zen 3D | |
| **Zen 3D** is a unified framework for controllable generation of 3D assets. Based on Hunyuan3D-Omni, it provides multi-modal control for creating high-fidelity 3D models from images, point clouds, voxels, poses, and bounding boxes. | |
| <p align="center"> | |
| <img src="assets/omni_teaser.png"> | |
| </p> | |
| ## Overview | |
| Zen 3D inherits the powerful architecture of Hunyuan3D 2.1 and extends it with a unified control encoder for additional control signals: | |
| - **Point Cloud Control**: Generate 3D models guided by input point clouds | |
| - **Voxel Control**: Create 3D models from voxel representations | |
| - **Pose Control**: Generate 3D human models with specific skeletal poses | |
| - **Bounding Box Control**: Generate 3D models constrained by 3D bounding boxes | |
| <p align="left"> | |
| <img src="assets/framework.jpg"> | |
| </p> | |
| ## Features | |
| - 🎨 **Multi-Modal Control**: Point cloud, voxel, skeleton, and bounding box | |
| - 🚀 **High Quality**: Production-ready PBR materials | |
| - âš¡ **FlashVDM**: Optional optimization for faster inference | |
| - 🎯 **10GB VRAM**: Efficient generation on consumer GPUs | |
| - 🔧 **EMA Support**: Exponential Moving Average for stable inference | |
| ## Model Details | |
| | Model | Description | Parameters | Date | HuggingFace | | |
| |-------|-------------|------------|------|-------------| | |
| | Zen 3D | Image/Control to 3D Model | 3.3B | 2025-09 | [Download](https://huggingface.co/zenlm/zen-3d) | | |
| **Memory Requirements**: 10GB VRAM minimum | |
| ## Installation | |
| ### Requirements | |
| Python 3.10+ recommended. | |
| ```bash | |
| # Install PyTorch with CUDA 12.4 | |
| pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ### Quick Start | |
| ```bash | |
| # Clone repository | |
| git clone https://github.com/zenlm/zen-3d.git | |
| cd zen-3d | |
| # Install | |
| pip install -r requirements.txt | |
| # Download model | |
| huggingface-cli download zenlm/zen-3d --local-dir ./models | |
| ``` | |
| ## Usage | |
| ### Basic Inference | |
| ```bash | |
| # Point cloud control | |
| python3 inference.py --control_type point | |
| # Voxel control | |
| python3 inference.py --control_type voxel | |
| # Pose control (human models) | |
| python3 inference.py --control_type pose | |
| # Bounding box control | |
| python3 inference.py --control_type bbox | |
| ``` | |
| ### Advanced Options | |
| ```bash | |
| # Use EMA model for more stable results | |
| python3 inference.py --control_type point --use_ema | |
| # Enable FlashVDM optimization for faster inference | |
| python3 inference.py --control_type point --flashvdm | |
| # Combine both | |
| python3 inference.py --control_type point --use_ema --flashvdm | |
| ``` | |
| ### Control Types | |
| | Control Type | Description | Use Case | | |
| |--------------|-------------|----------| | |
| | `point` | Point cloud input | Scan data, LiDAR, structured surfaces | | |
| | `voxel` | Voxel representation | Volumetric data, medical imaging | | |
| | `pose` | Skeletal pose | Human/character models, animation | | |
| | `bbox` | 3D bounding boxes | Scene layout, object placement | | |
| ## Python API | |
| ```python | |
| from zen_3d import Zen3DGenerator | |
| # Initialize model | |
| generator = Zen3DGenerator( | |
| model_path="./models", | |
| device="cuda", | |
| use_ema=True, | |
| flashvdm=True | |
| ) | |
| # Point cloud control | |
| point_cloud = load_point_cloud("input.ply") | |
| result = generator.generate( | |
| control_type="point", | |
| control_data=point_cloud, | |
| image="reference.jpg" | |
| ) | |
| # Save result | |
| result.save("output.obj") | |
| ``` | |
| ## Training | |
| Zen 3D can be trained on custom 3D datasets using Zen Gym: | |
| ```bash | |
| cd /Users/z/work/zen/gym | |
| # LoRA finetuning for Zen 3D | |
| llamafactory-cli train \ | |
| --config configs/zen_3d_lora.yaml \ | |
| --dataset your_3d_dataset | |
| ``` | |
| See [Zen Gym](https://github.com/zenlm/zen-gym) for training infrastructure. | |
| ## Performance | |
| | Hardware | Control Type | Generation Time | VRAM Usage | | |
| |----------|--------------|-----------------|------------| | |
| | RTX 4090 | Point | ~30s | 10GB | | |
| | RTX 4090 | Point + FlashVDM | ~20s | 10GB | | |
| | RTX 3090 | Voxel | ~45s | 10GB | | |
| | RTX 3060 | Pose | ~60s | 12GB | | |
| ## Examples | |
| ### Point Cloud to 3D | |
| ```bash | |
| python3 inference.py \ | |
| --control_type point \ | |
| --input examples/chair.ply \ | |
| --image examples/chair.jpg \ | |
| --output output/chair.obj \ | |
| --use_ema | |
| ``` | |
| ### Pose-Controlled Human | |
| ```bash | |
| python3 inference.py \ | |
| --control_type pose \ | |
| --skeleton examples/pose.json \ | |
| --image examples/person.jpg \ | |
| --output output/person.obj | |
| ``` | |
| ### Voxel to 3D | |
| ```bash | |
| python3 inference.py \ | |
| --control_type voxel \ | |
| --voxel_grid examples/car.vox \ | |
| --output output/car.obj \ | |
| --flashvdm | |
| ``` | |
| ## Integration with Zen Ecosystem | |
| Zen 3D integrates seamlessly with other Zen tools: | |
| - **Zen Gym**: Train custom 3D models with LoRA | |
| - **Zen Engine**: Serve 3D generation via API | |
| - **Zen Director**: Generate videos from 3D scenes | |
| ## Output Formats | |
| - **OBJ**: Wavefront OBJ with materials | |
| - **GLB**: Binary glTF for web/game engines | |
| - **USD**: Universal Scene Description for production | |
| - **FBX**: Autodesk format for animation | |
| ## Advanced Usage | |
| ### Batch Generation | |
| ```python | |
| from zen_3d import Zen3DGenerator | |
| generator = Zen3DGenerator(device="cuda") | |
| # Batch process multiple inputs | |
| inputs = [ | |
| {"control_type": "point", "data": "scan1.ply"}, | |
| {"control_type": "point", "data": "scan2.ply"}, | |
| {"control_type": "voxel", "data": "voxel1.vox"}, | |
| ] | |
| results = generator.batch_generate(inputs, batch_size=4) | |
| ``` | |
| ### Custom Control Signals | |
| ```python | |
| # Combine multiple control signals | |
| result = generator.generate( | |
| control_type="hybrid", | |
| point_cloud=point_data, | |
| bbox=bounding_boxes, | |
| image=reference_image | |
| ) | |
| ``` | |
| ## Benchmarks | |
| ### Quality Metrics | |
| | Control Type | FID ↓ | LPIPS ↓ | CD ↓ | | |
| |--------------|-------|---------|------| | |
| | Point Cloud | 12.3 | 0.085 | 0.021 | | |
| | Voxel | 15.7 | 0.092 | 0.028 | | |
| | Pose | 14.1 | 0.088 | N/A | | |
| | Bounding Box | 18.2 | 0.095 | 0.032 | | |
| ### Speed Benchmarks (RTX 4090) | |
| | Configuration | Tokens/sec | Generation Time | | |
| |---------------|------------|-----------------| | |
| | Base | 850 | 35s | | |
| | + EMA | 800 | 38s | | |
| | + FlashVDM | 1200 | 25s | | |
| | + EMA + FlashVDM | 1100 | 27s | | |
| ## Citation | |
| If you use Zen 3D in your research, please cite: | |
| ```bibtex | |
| @misc{zen3d2025, | |
| title={Zen 3D: Unified Framework for Controllable 3D Asset Generation}, | |
| author={Zen AI Team}, | |
| year={2025}, | |
| howpublished={\url{https://github.com/zenlm/zen-3d}} | |
| } | |
| @misc{hunyuan3d2025hunyuan3domni, | |
| title={Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets}, | |
| author={Tencent Hunyuan3D Team}, | |
| year={2025}, | |
| eprint={2509.21245}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CV} | |
| } | |
| ``` | |
| ## Credits | |
| Zen 3D is based on [Hunyuan3D-Omni](https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni) by Tencent. We thank the original authors and contributors: | |
| - [TripoSG](https://github.com/VAST-AI-Research/TripoSG) | |
| - [CLAY](https://arxiv.org/abs/2406.13897) | |
| - [Trellis](https://github.com/microsoft/TRELLIS) | |
| - [DINOv2](https://github.com/facebookresearch/dinov2) | |
| - [CraftsMan3D](https://github.com/wyysf-98/CraftsMan3D) | |
| - [Michelangelo](https://github.com/NeuralCarver/Michelangelo) | |
| ## License | |
| Apache 2.0 License - see [LICENSE](LICENSE) for details. | |
| ## Links | |
| - **GitHub**: https://github.com/zenlm/zen-3d | |
| - **HuggingFace**: https://huggingface.co/zenlm/zen-3d | |
| - **Organization**: https://github.com/zenlm | |
| - **Zen Gym** (Training): https://github.com/zenlm/zen-gym | |
| - **Zen Engine** (Inference): https://github.com/zenlm/zen-engine | |
| - **Zen Musician**: https://github.com/zenlm/zen-musician | |
| --- | |
| **Zen 3D** - Controllable 3D generation for the Zen AI ecosystem | |
| Part of the **[Zen AI](https://github.com/zenlm)** ecosystem. | |
| --- | |
| ## Based On | |
| **zen-3d** is based on [Hunyuan3D-Omni](https://github.com/Tencent/Hunyuan3D-1) | |
| We are grateful to the original authors for their excellent work and open-source contributions. | |
| ### Upstream Source | |
| - **Repository**: https://github.com/Tencent/Hunyuan3D-1 | |
| - **Base Model**: Hunyuan3D-Omni | |
| - **License**: See original repository for license details | |
| ### Changes in Zen LM | |
| - Adapted for Zen AI ecosystem | |
| - Fine-tuned for specific use cases | |
| - Added training and inference scripts | |
| - Integrated with Zen Gym and Zen Engine | |
| - Enhanced documentation and examples | |
| ### Citation | |
| If you use this model, please cite both the original work and Zen LM: | |
| ```bibtex | |
| @misc{zenlm2025zen-3d, | |
| title={Zen LM: zen-3d}, | |
| author={Hanzo AI and Zoo Labs Foundation}, | |
| year={2025}, | |
| publisher={HuggingFace}, | |
| howpublished={\url{https://huggingface.co/zenlm/zen-3d}} | |
| } | |
| ``` | |
| Please also cite the original upstream work - see https://github.com/Tencent/Hunyuan3D-1 for citation details. | |