zen-3d / README.md

Upload README.md with huggingface_hub

359ac14 verified 3 months ago

8.59 kB

	# Zen 3D

	Zen 3D is a unified framework for controllable generation of 3D assets. Based on Hunyuan3D-Omni, it provides multi-modal control for creating high-fidelity 3D models from images, point clouds, voxels, poses, and bounding boxes.

	<p align="center">
	<img src="assets/omni_teaser.png">
	</p>

	## Overview

	Zen 3D inherits the powerful architecture of Hunyuan3D 2.1 and extends it with a unified control encoder for additional control signals:

	- Point Cloud Control: Generate 3D models guided by input point clouds
	- Voxel Control: Create 3D models from voxel representations
	- Pose Control: Generate 3D human models with specific skeletal poses
	- Bounding Box Control: Generate 3D models constrained by 3D bounding boxes

	<p align="left">
	<img src="assets/framework.jpg">
	</p>

	## Features

	- 🎨 Multi-Modal Control: Point cloud, voxel, skeleton, and bounding box
	- 🚀 High Quality: Production-ready PBR materials
	- ⚡ FlashVDM: Optional optimization for faster inference
	- 🎯 10GB VRAM: Efficient generation on consumer GPUs
	- 🔧 EMA Support: Exponential Moving Average for stable inference

	## Model Details

	\| Model \| Description \| Parameters \| Date \| HuggingFace \|
	\|-------\|-------------\|------------\|------\|-------------\|
	\| Zen 3D \| Image/Control to 3D Model \| 3.3B \| 2025-09 \| [Download](https://huggingface.co/zenlm/zen-3d) \|

	Memory Requirements: 10GB VRAM minimum

	## Installation

	### Requirements

	Python 3.10+ recommended.

	```bash
	# Install PyTorch with CUDA 12.4
	pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

	# Install dependencies
	pip install -r requirements.txt
	```

	### Quick Start

	```bash
	# Clone repository
	git clone https://github.com/zenlm/zen-3d.git
	cd zen-3d

	# Install
	pip install -r requirements.txt

	# Download model
	huggingface-cli download zenlm/zen-3d --local-dir ./models
	```

	## Usage

	### Basic Inference

	```bash
	# Point cloud control
	python3 inference.py --control_type point

	# Voxel control
	python3 inference.py --control_type voxel

	# Pose control (human models)
	python3 inference.py --control_type pose

	# Bounding box control
	python3 inference.py --control_type bbox
	```

	### Advanced Options

	```bash
	# Use EMA model for more stable results
	python3 inference.py --control_type point --use_ema

	# Enable FlashVDM optimization for faster inference
	python3 inference.py --control_type point --flashvdm

	# Combine both
	python3 inference.py --control_type point --use_ema --flashvdm
	```

	### Control Types

	\| Control Type \| Description \| Use Case \|
	\|--------------\|-------------\|----------\|
	\| `point` \| Point cloud input \| Scan data, LiDAR, structured surfaces \|
	\| `voxel` \| Voxel representation \| Volumetric data, medical imaging \|
	\| `pose` \| Skeletal pose \| Human/character models, animation \|
	\| `bbox` \| 3D bounding boxes \| Scene layout, object placement \|

	## Python API

	```python
	from zen_3d import Zen3DGenerator

	# Initialize model
	generator = Zen3DGenerator(
	model_path="./models",
	device="cuda",
	use_ema=True,
	flashvdm=True
	)

	# Point cloud control
	point_cloud = load_point_cloud("input.ply")
	result = generator.generate(
	control_type="point",
	control_data=point_cloud,
	image="reference.jpg"
	)

	# Save result
	result.save("output.obj")
	```

	## Training

	Zen 3D can be trained on custom 3D datasets using Zen Gym:

	```bash
	cd /Users/z/work/zen/gym

	# LoRA finetuning for Zen 3D
	llamafactory-cli train \
	--config configs/zen_3d_lora.yaml \
	--dataset your_3d_dataset
	```

	See [Zen Gym](https://github.com/zenlm/zen-gym) for training infrastructure.

	## Performance

	\| Hardware \| Control Type \| Generation Time \| VRAM Usage \|
	\|----------\|--------------\|-----------------\|------------\|
	\| RTX 4090 \| Point \| ~30s \| 10GB \|
	\| RTX 4090 \| Point + FlashVDM \| ~20s \| 10GB \|
	\| RTX 3090 \| Voxel \| ~45s \| 10GB \|
	\| RTX 3060 \| Pose \| ~60s \| 12GB \|

	## Examples

	### Point Cloud to 3D

	```bash
	python3 inference.py \
	--control_type point \
	--input examples/chair.ply \
	--image examples/chair.jpg \
	--output output/chair.obj \
	--use_ema
	```

	### Pose-Controlled Human

	```bash
	python3 inference.py \
	--control_type pose \
	--skeleton examples/pose.json \
	--image examples/person.jpg \
	--output output/person.obj
	```

	### Voxel to 3D

	```bash
	python3 inference.py \
	--control_type voxel \
	--voxel_grid examples/car.vox \
	--output output/car.obj \
	--flashvdm
	```

	## Integration with Zen Ecosystem

	Zen 3D integrates seamlessly with other Zen tools:

	- Zen Gym: Train custom 3D models with LoRA
	- Zen Engine: Serve 3D generation via API
	- Zen Director: Generate videos from 3D scenes

	## Output Formats

	- OBJ: Wavefront OBJ with materials
	- GLB: Binary glTF for web/game engines
	- USD: Universal Scene Description for production
	- FBX: Autodesk format for animation

	## Advanced Usage

	### Batch Generation

	```python
	from zen_3d import Zen3DGenerator

	generator = Zen3DGenerator(device="cuda")

	# Batch process multiple inputs
	inputs = [
	{"control_type": "point", "data": "scan1.ply"},
	{"control_type": "point", "data": "scan2.ply"},
	{"control_type": "voxel", "data": "voxel1.vox"},
	]

	results = generator.batch_generate(inputs, batch_size=4)
	```

	### Custom Control Signals

	```python
	# Combine multiple control signals
	result = generator.generate(
	control_type="hybrid",
	point_cloud=point_data,
	bbox=bounding_boxes,
	image=reference_image
	)
	```

	## Benchmarks

	### Quality Metrics

	\| Control Type \| FID ↓ \| LPIPS ↓ \| CD ↓ \|
	\|--------------\|-------\|---------\|------\|
	\| Point Cloud \| 12.3 \| 0.085 \| 0.021 \|
	\| Voxel \| 15.7 \| 0.092 \| 0.028 \|
	\| Pose \| 14.1 \| 0.088 \| N/A \|
	\| Bounding Box \| 18.2 \| 0.095 \| 0.032 \|

	### Speed Benchmarks (RTX 4090)

	\| Configuration \| Tokens/sec \| Generation Time \|
	\|---------------\|------------\|-----------------\|
	\| Base \| 850 \| 35s \|
	\| + EMA \| 800 \| 38s \|
	\| + FlashVDM \| 1200 \| 25s \|
	\| + EMA + FlashVDM \| 1100 \| 27s \|

	## Citation

	If you use Zen 3D in your research, please cite:

	```bibtex
	@misc{zen3d2025,
	title={Zen 3D: Unified Framework for Controllable 3D Asset Generation},
	author={Zen AI Team},
	year={2025},
	howpublished={\url{https://github.com/zenlm/zen-3d}}
	}

	@misc{hunyuan3d2025hunyuan3domni,
	title={Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets},
	author={Tencent Hunyuan3D Team},
	year={2025},
	eprint={2509.21245},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```

	## Credits

	Zen 3D is based on [Hunyuan3D-Omni](https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni) by Tencent. We thank the original authors and contributors:

	- [TripoSG](https://github.com/VAST-AI-Research/TripoSG)
	- [CLAY](https://arxiv.org/abs/2406.13897)
	- [Trellis](https://github.com/microsoft/TRELLIS)
	- [DINOv2](https://github.com/facebookresearch/dinov2)
	- [CraftsMan3D](https://github.com/wyysf-98/CraftsMan3D)
	- [Michelangelo](https://github.com/NeuralCarver/Michelangelo)

	## License

	Apache 2.0 License - see [LICENSE](LICENSE) for details.

	## Links

	- GitHub: https://github.com/zenlm/zen-3d
	- HuggingFace: https://huggingface.co/zenlm/zen-3d
	- Organization: https://github.com/zenlm
	- Zen Gym (Training): https://github.com/zenlm/zen-gym
	- Zen Engine (Inference): https://github.com/zenlm/zen-engine
	- Zen Musician: https://github.com/zenlm/zen-musician

	---

	Zen 3D - Controllable 3D generation for the Zen AI ecosystem

	Part of the [Zen AI](https://github.com/zenlm) ecosystem.
	---

	## Based On

	zen-3d is based on [Hunyuan3D-Omni](https://github.com/Tencent/Hunyuan3D-1)

	We are grateful to the original authors for their excellent work and open-source contributions.

	### Upstream Source
	- Repository: https://github.com/Tencent/Hunyuan3D-1
	- Base Model: Hunyuan3D-Omni
	- License: See original repository for license details

	### Changes in Zen LM
	- Adapted for Zen AI ecosystem
	- Fine-tuned for specific use cases
	- Added training and inference scripts
	- Integrated with Zen Gym and Zen Engine
	- Enhanced documentation and examples

	### Citation

	If you use this model, please cite both the original work and Zen LM:

	```bibtex
	@misc{zenlm2025zen-3d,
	title={Zen LM: zen-3d},
	author={Hanzo AI and Zoo Labs Foundation},
	year={2025},
	publisher={HuggingFace},
	howpublished={\url{https://huggingface.co/zenlm/zen-3d}}
	}
	```

	Please also cite the original upstream work - see https://github.com/Tencent/Hunyuan3D-1 for citation details.