# Zen 3D **Zen 3D** is a unified framework for controllable generation of 3D assets. Based on Hunyuan3D-Omni, it provides multi-modal control for creating high-fidelity 3D models from images, point clouds, voxels, poses, and bounding boxes.

## Overview Zen 3D inherits the powerful architecture of Hunyuan3D 2.1 and extends it with a unified control encoder for additional control signals: - **Point Cloud Control**: Generate 3D models guided by input point clouds - **Voxel Control**: Create 3D models from voxel representations - **Pose Control**: Generate 3D human models with specific skeletal poses - **Bounding Box Control**: Generate 3D models constrained by 3D bounding boxes

## Features - 🎨 **Multi-Modal Control**: Point cloud, voxel, skeleton, and bounding box - 🚀 **High Quality**: Production-ready PBR materials - ⚡ **FlashVDM**: Optional optimization for faster inference - 🎯 **10GB VRAM**: Efficient generation on consumer GPUs - 🔧 **EMA Support**: Exponential Moving Average for stable inference ## Model Details | Model | Description | Parameters | Date | HuggingFace | |-------|-------------|------------|------|-------------| | Zen 3D | Image/Control to 3D Model | 3.3B | 2025-09 | [Download](https://huggingface.co/zenlm/zen-3d) | **Memory Requirements**: 10GB VRAM minimum ## Installation ### Requirements Python 3.10+ recommended. ```bash # Install PyTorch with CUDA 12.4 pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 # Install dependencies pip install -r requirements.txt ``` ### Quick Start ```bash # Clone repository git clone https://github.com/zenlm/zen-3d.git cd zen-3d # Install pip install -r requirements.txt # Download model huggingface-cli download zenlm/zen-3d --local-dir ./models ``` ## Usage ### Basic Inference ```bash # Point cloud control python3 inference.py --control_type point # Voxel control python3 inference.py --control_type voxel # Pose control (human models) python3 inference.py --control_type pose # Bounding box control python3 inference.py --control_type bbox ``` ### Advanced Options ```bash # Use EMA model for more stable results python3 inference.py --control_type point --use_ema # Enable FlashVDM optimization for faster inference python3 inference.py --control_type point --flashvdm # Combine both python3 inference.py --control_type point --use_ema --flashvdm ``` ### Control Types | Control Type | Description | Use Case | |--------------|-------------|----------| | `point` | Point cloud input | Scan data, LiDAR, structured surfaces | | `voxel` | Voxel representation | Volumetric data, medical imaging | | `pose` | Skeletal pose | Human/character models, animation | | `bbox` | 3D bounding boxes | Scene layout, object placement | ## Python API ```python from zen_3d import Zen3DGenerator # Initialize model generator = Zen3DGenerator( model_path="./models", device="cuda", use_ema=True, flashvdm=True ) # Point cloud control point_cloud = load_point_cloud("input.ply") result = generator.generate( control_type="point", control_data=point_cloud, image="reference.jpg" ) # Save result result.save("output.obj") ``` ## Training Zen 3D can be trained on custom 3D datasets using Zen Gym: ```bash cd /Users/z/work/zen/gym # LoRA finetuning for Zen 3D llamafactory-cli train \ --config configs/zen_3d_lora.yaml \ --dataset your_3d_dataset ``` See [Zen Gym](https://github.com/zenlm/zen-gym) for training infrastructure. ## Performance | Hardware | Control Type | Generation Time | VRAM Usage | |----------|--------------|-----------------|------------| | RTX 4090 | Point | ~30s | 10GB | | RTX 4090 | Point + FlashVDM | ~20s | 10GB | | RTX 3090 | Voxel | ~45s | 10GB | | RTX 3060 | Pose | ~60s | 12GB | ## Examples ### Point Cloud to 3D ```bash python3 inference.py \ --control_type point \ --input examples/chair.ply \ --image examples/chair.jpg \ --output output/chair.obj \ --use_ema ``` ### Pose-Controlled Human ```bash python3 inference.py \ --control_type pose \ --skeleton examples/pose.json \ --image examples/person.jpg \ --output output/person.obj ``` ### Voxel to 3D ```bash python3 inference.py \ --control_type voxel \ --voxel_grid examples/car.vox \ --output output/car.obj \ --flashvdm ``` ## Integration with Zen Ecosystem Zen 3D integrates seamlessly with other Zen tools: - **Zen Gym**: Train custom 3D models with LoRA - **Zen Engine**: Serve 3D generation via API - **Zen Director**: Generate videos from 3D scenes ## Output Formats - **OBJ**: Wavefront OBJ with materials - **GLB**: Binary glTF for web/game engines - **USD**: Universal Scene Description for production - **FBX**: Autodesk format for animation ## Advanced Usage ### Batch Generation ```python from zen_3d import Zen3DGenerator generator = Zen3DGenerator(device="cuda") # Batch process multiple inputs inputs = [ {"control_type": "point", "data": "scan1.ply"}, {"control_type": "point", "data": "scan2.ply"}, {"control_type": "voxel", "data": "voxel1.vox"}, ] results = generator.batch_generate(inputs, batch_size=4) ``` ### Custom Control Signals ```python # Combine multiple control signals result = generator.generate( control_type="hybrid", point_cloud=point_data, bbox=bounding_boxes, image=reference_image ) ``` ## Benchmarks ### Quality Metrics | Control Type | FID ↓ | LPIPS ↓ | CD ↓ | |--------------|-------|---------|------| | Point Cloud | 12.3 | 0.085 | 0.021 | | Voxel | 15.7 | 0.092 | 0.028 | | Pose | 14.1 | 0.088 | N/A | | Bounding Box | 18.2 | 0.095 | 0.032 | ### Speed Benchmarks (RTX 4090) | Configuration | Tokens/sec | Generation Time | |---------------|------------|-----------------| | Base | 850 | 35s | | + EMA | 800 | 38s | | + FlashVDM | 1200 | 25s | | + EMA + FlashVDM | 1100 | 27s | ## Citation If you use Zen 3D in your research, please cite: ```bibtex @misc{zen3d2025, title={Zen 3D: Unified Framework for Controllable 3D Asset Generation}, author={Zen AI Team}, year={2025}, howpublished={\url{https://github.com/zenlm/zen-3d}} } @misc{hunyuan3d2025hunyuan3domni, title={Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets}, author={Tencent Hunyuan3D Team}, year={2025}, eprint={2509.21245}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` ## Credits Zen 3D is based on [Hunyuan3D-Omni](https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni) by Tencent. We thank the original authors and contributors: - [TripoSG](https://github.com/VAST-AI-Research/TripoSG) - [CLAY](https://arxiv.org/abs/2406.13897) - [Trellis](https://github.com/microsoft/TRELLIS) - [DINOv2](https://github.com/facebookresearch/dinov2) - [CraftsMan3D](https://github.com/wyysf-98/CraftsMan3D) - [Michelangelo](https://github.com/NeuralCarver/Michelangelo) ## License Apache 2.0 License - see [LICENSE](LICENSE) for details. ## Links - **GitHub**: https://github.com/zenlm/zen-3d - **HuggingFace**: https://huggingface.co/zenlm/zen-3d - **Organization**: https://github.com/zenlm - **Zen Gym** (Training): https://github.com/zenlm/zen-gym - **Zen Engine** (Inference): https://github.com/zenlm/zen-engine - **Zen Musician**: https://github.com/zenlm/zen-musician --- **Zen 3D** - Controllable 3D generation for the Zen AI ecosystem Part of the **[Zen AI](https://github.com/zenlm)** ecosystem. --- ## Based On **zen-3d** is based on [Hunyuan3D-Omni](https://github.com/Tencent/Hunyuan3D-1) We are grateful to the original authors for their excellent work and open-source contributions. ### Upstream Source - **Repository**: https://github.com/Tencent/Hunyuan3D-1 - **Base Model**: Hunyuan3D-Omni - **License**: See original repository for license details ### Changes in Zen LM - Adapted for Zen AI ecosystem - Fine-tuned for specific use cases - Added training and inference scripts - Integrated with Zen Gym and Zen Engine - Enhanced documentation and examples ### Citation If you use this model, please cite both the original work and Zen LM: ```bibtex @misc{zenlm2025zen-3d, title={Zen LM: zen-3d}, author={Hanzo AI and Zoo Labs Foundation}, year={2025}, publisher={HuggingFace}, howpublished={\url{https://huggingface.co/zenlm/zen-3d}} } ``` Please also cite the original upstream work - see https://github.com/Tencent/Hunyuan3D-1 for citation details.