zen-3d / README.md
zeekay's picture
Upload README.md with huggingface_hub
359ac14 verified
# Zen 3D
**Zen 3D** is a unified framework for controllable generation of 3D assets. Based on Hunyuan3D-Omni, it provides multi-modal control for creating high-fidelity 3D models from images, point clouds, voxels, poses, and bounding boxes.
<p align="center">
<img src="assets/omni_teaser.png">
</p>
## Overview
Zen 3D inherits the powerful architecture of Hunyuan3D 2.1 and extends it with a unified control encoder for additional control signals:
- **Point Cloud Control**: Generate 3D models guided by input point clouds
- **Voxel Control**: Create 3D models from voxel representations
- **Pose Control**: Generate 3D human models with specific skeletal poses
- **Bounding Box Control**: Generate 3D models constrained by 3D bounding boxes
<p align="left">
<img src="assets/framework.jpg">
</p>
## Features
- 🎨 **Multi-Modal Control**: Point cloud, voxel, skeleton, and bounding box
- 🚀 **High Quality**: Production-ready PBR materials
- âš¡ **FlashVDM**: Optional optimization for faster inference
- 🎯 **10GB VRAM**: Efficient generation on consumer GPUs
- 🔧 **EMA Support**: Exponential Moving Average for stable inference
## Model Details
| Model | Description | Parameters | Date | HuggingFace |
|-------|-------------|------------|------|-------------|
| Zen 3D | Image/Control to 3D Model | 3.3B | 2025-09 | [Download](https://huggingface.co/zenlm/zen-3d) |
**Memory Requirements**: 10GB VRAM minimum
## Installation
### Requirements
Python 3.10+ recommended.
```bash
# Install PyTorch with CUDA 12.4
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
# Install dependencies
pip install -r requirements.txt
```
### Quick Start
```bash
# Clone repository
git clone https://github.com/zenlm/zen-3d.git
cd zen-3d
# Install
pip install -r requirements.txt
# Download model
huggingface-cli download zenlm/zen-3d --local-dir ./models
```
## Usage
### Basic Inference
```bash
# Point cloud control
python3 inference.py --control_type point
# Voxel control
python3 inference.py --control_type voxel
# Pose control (human models)
python3 inference.py --control_type pose
# Bounding box control
python3 inference.py --control_type bbox
```
### Advanced Options
```bash
# Use EMA model for more stable results
python3 inference.py --control_type point --use_ema
# Enable FlashVDM optimization for faster inference
python3 inference.py --control_type point --flashvdm
# Combine both
python3 inference.py --control_type point --use_ema --flashvdm
```
### Control Types
| Control Type | Description | Use Case |
|--------------|-------------|----------|
| `point` | Point cloud input | Scan data, LiDAR, structured surfaces |
| `voxel` | Voxel representation | Volumetric data, medical imaging |
| `pose` | Skeletal pose | Human/character models, animation |
| `bbox` | 3D bounding boxes | Scene layout, object placement |
## Python API
```python
from zen_3d import Zen3DGenerator
# Initialize model
generator = Zen3DGenerator(
model_path="./models",
device="cuda",
use_ema=True,
flashvdm=True
)
# Point cloud control
point_cloud = load_point_cloud("input.ply")
result = generator.generate(
control_type="point",
control_data=point_cloud,
image="reference.jpg"
)
# Save result
result.save("output.obj")
```
## Training
Zen 3D can be trained on custom 3D datasets using Zen Gym:
```bash
cd /Users/z/work/zen/gym
# LoRA finetuning for Zen 3D
llamafactory-cli train \
--config configs/zen_3d_lora.yaml \
--dataset your_3d_dataset
```
See [Zen Gym](https://github.com/zenlm/zen-gym) for training infrastructure.
## Performance
| Hardware | Control Type | Generation Time | VRAM Usage |
|----------|--------------|-----------------|------------|
| RTX 4090 | Point | ~30s | 10GB |
| RTX 4090 | Point + FlashVDM | ~20s | 10GB |
| RTX 3090 | Voxel | ~45s | 10GB |
| RTX 3060 | Pose | ~60s | 12GB |
## Examples
### Point Cloud to 3D
```bash
python3 inference.py \
--control_type point \
--input examples/chair.ply \
--image examples/chair.jpg \
--output output/chair.obj \
--use_ema
```
### Pose-Controlled Human
```bash
python3 inference.py \
--control_type pose \
--skeleton examples/pose.json \
--image examples/person.jpg \
--output output/person.obj
```
### Voxel to 3D
```bash
python3 inference.py \
--control_type voxel \
--voxel_grid examples/car.vox \
--output output/car.obj \
--flashvdm
```
## Integration with Zen Ecosystem
Zen 3D integrates seamlessly with other Zen tools:
- **Zen Gym**: Train custom 3D models with LoRA
- **Zen Engine**: Serve 3D generation via API
- **Zen Director**: Generate videos from 3D scenes
## Output Formats
- **OBJ**: Wavefront OBJ with materials
- **GLB**: Binary glTF for web/game engines
- **USD**: Universal Scene Description for production
- **FBX**: Autodesk format for animation
## Advanced Usage
### Batch Generation
```python
from zen_3d import Zen3DGenerator
generator = Zen3DGenerator(device="cuda")
# Batch process multiple inputs
inputs = [
{"control_type": "point", "data": "scan1.ply"},
{"control_type": "point", "data": "scan2.ply"},
{"control_type": "voxel", "data": "voxel1.vox"},
]
results = generator.batch_generate(inputs, batch_size=4)
```
### Custom Control Signals
```python
# Combine multiple control signals
result = generator.generate(
control_type="hybrid",
point_cloud=point_data,
bbox=bounding_boxes,
image=reference_image
)
```
## Benchmarks
### Quality Metrics
| Control Type | FID ↓ | LPIPS ↓ | CD ↓ |
|--------------|-------|---------|------|
| Point Cloud | 12.3 | 0.085 | 0.021 |
| Voxel | 15.7 | 0.092 | 0.028 |
| Pose | 14.1 | 0.088 | N/A |
| Bounding Box | 18.2 | 0.095 | 0.032 |
### Speed Benchmarks (RTX 4090)
| Configuration | Tokens/sec | Generation Time |
|---------------|------------|-----------------|
| Base | 850 | 35s |
| + EMA | 800 | 38s |
| + FlashVDM | 1200 | 25s |
| + EMA + FlashVDM | 1100 | 27s |
## Citation
If you use Zen 3D in your research, please cite:
```bibtex
@misc{zen3d2025,
title={Zen 3D: Unified Framework for Controllable 3D Asset Generation},
author={Zen AI Team},
year={2025},
howpublished={\url{https://github.com/zenlm/zen-3d}}
}
@misc{hunyuan3d2025hunyuan3domni,
title={Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets},
author={Tencent Hunyuan3D Team},
year={2025},
eprint={2509.21245},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## Credits
Zen 3D is based on [Hunyuan3D-Omni](https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni) by Tencent. We thank the original authors and contributors:
- [TripoSG](https://github.com/VAST-AI-Research/TripoSG)
- [CLAY](https://arxiv.org/abs/2406.13897)
- [Trellis](https://github.com/microsoft/TRELLIS)
- [DINOv2](https://github.com/facebookresearch/dinov2)
- [CraftsMan3D](https://github.com/wyysf-98/CraftsMan3D)
- [Michelangelo](https://github.com/NeuralCarver/Michelangelo)
## License
Apache 2.0 License - see [LICENSE](LICENSE) for details.
## Links
- **GitHub**: https://github.com/zenlm/zen-3d
- **HuggingFace**: https://huggingface.co/zenlm/zen-3d
- **Organization**: https://github.com/zenlm
- **Zen Gym** (Training): https://github.com/zenlm/zen-gym
- **Zen Engine** (Inference): https://github.com/zenlm/zen-engine
- **Zen Musician**: https://github.com/zenlm/zen-musician
---
**Zen 3D** - Controllable 3D generation for the Zen AI ecosystem
Part of the **[Zen AI](https://github.com/zenlm)** ecosystem.
---
## Based On
**zen-3d** is based on [Hunyuan3D-Omni](https://github.com/Tencent/Hunyuan3D-1)
We are grateful to the original authors for their excellent work and open-source contributions.
### Upstream Source
- **Repository**: https://github.com/Tencent/Hunyuan3D-1
- **Base Model**: Hunyuan3D-Omni
- **License**: See original repository for license details
### Changes in Zen LM
- Adapted for Zen AI ecosystem
- Fine-tuned for specific use cases
- Added training and inference scripts
- Integrated with Zen Gym and Zen Engine
- Enhanced documentation and examples
### Citation
If you use this model, please cite both the original work and Zen LM:
```bibtex
@misc{zenlm2025zen-3d,
title={Zen LM: zen-3d},
author={Hanzo AI and Zoo Labs Foundation},
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/zenlm/zen-3d}}
}
```
Please also cite the original upstream work - see https://github.com/Tencent/Hunyuan3D-1 for citation details.