InteriorFusion / docs /INFERENCE_OPTIMIZATION.md

Upload docs/INFERENCE_OPTIMIZATION.md

d78cc54 verified 13 days ago

4.96 kB

	# InteriorFusion Inference Optimization Guide

	## Target Platforms

	### RTX 4090 (24GB VRAM) — Consumer Desktop
	```bash
	# Quantized inference with INT8
	python -m interiorfusion.infer \
	--image room.jpg \
	--output ./output/ \
	--model-size L \
	--device cuda \
	--dtype float16 \
	--no-pbr # Disable PBR for faster generation

	# Expected: ~12s for full scene with GLB+PLY output
	```

	Optimizations:
	- FP16 inference throughout pipeline
	- Skip material generation for preview mode
	- Use `torch.compile()` on DiT forward pass
	- Flash Attention 2 for transformer attention
	- Batch multi-view generation (6 views simultaneously)

	### A100 (80GB VRAM) — Cloud / Datacenter
	```bash
	# Full quality generation
	python -m interiorfusion.infer \
	--image room.jpg \
	--output ./output/ \
	--model-size XL \
	--device cuda \
	--dtype bfloat16

	# Expected: ~8s for full scene with all formats
	```

	Optimizations:
	- BF16 precision (better numerical stability than FP16)
	- Batch size 4 for parallel room generation
	- CUDA Graphs for repeated operations
	- Persistent CUDA cache

	### H100 (80GB VRAM) — Latest Datacenter
	```bash
	# Maximum quality with Transformer Engine
	python -m interiorfusion.infer \
	--image room.jpg \
	--output ./output/ \
	--model-size XL \
	--device cuda \
	--dtype bfloat16

	# Expected: ~5s full pipeline
	```

	Optimizations:
	- FP8 via Transformer Engine
	- Hardware-accelerated attention
	- NVLink for multi-GPU distribution

	### Apple Silicon (MLX)
	```bash
	# MLX-optimized inference
	python -m interiorfusion.infer \
	--image room.jpg \
	--output ./output/ \
	--model-size S \
	--device mps \
	--dtype float32

	# Expected: ~30s on M3 Max (36GB unified memory)
	```

	Optimizations:
	- MLX graph compilation
	- Unified memory avoids CPU-GPU copies
	- Model quantization to 4-bit via GPTQ

	### Edge / Mobile
	```bash
	# Core pipeline only (depth + layout)
	python -m interiorfusion.infer \
	--image room.jpg \
	--output ./output/ \
	--model-size S \
	--device cpu \
	--no-pbr --no-gaussian \
	--formats glb

	# Expected: ~5s depth+layout, scene sent to cloud for 3D generation
	```

	Optimizations:
	- Core inference on-device (depth + segmentation)
	- Cloud offloading for 3D generation
	- Streaming mesh chunks
	- Aggressive quantization (INT4)

	## Quantization Strategies

	\| Method \| Model Size \| Speedup \| Quality Impact \| VRAM Reduction \|
	\|--------\|-----------\|---------\|---------------\|---------------\|
	\| FP32 (baseline) \| 100% \| 1× \| — \| 100% \|
	\| FP16 \| 50% \| 1.8× \| Minimal \| 50% \|
	\| BF16 \| 50% \| 1.8× \| Minimal \| 50% \|
	\| INT8 (SmoothQuant) \| 25% \| 2.5× \| Low \| 25% \|
	\| FP8 (TE) \| 25% \| 3× \| Low \| 25% \|
	\| GPTQ-4bit \| 12.5% \| 3.5× \| Medium \| 12.5% \|
	\| AWQ-4bit \| 12.5% \| 3.2× \| Low \| 12.5% \|

	## Export Formats

	\| Format \| Size \| Viewer \| Game Engine \| AR/VR \| Notes \|
	\|--------\|------\|--------\|------------\|-------\|-------\|
	\| GLB \| ~5-50MB \| ✅ (Web) \| ✅ (UE/Unity) \| ✅ (WebXR) \| Recommended default \|
	\| FBX \| ~10-100MB \| ⚠️ (Limited) \| ✅ (UE/Unity/Maya) \| ⚠️ \| For animation/ rigging \|
	\| OBJ \| ~5-30MB \| ✅ (Universal) \| ✅ (All) \| ⚠️ \| Legacy, no PBR \|
	\| USDZ \| ~5-50MB \| ✅ (iOS AR) \| ⚠️ (UE via plugin) \| ✅ (ARKit) \| Apple's format \|
	\| PLY (3DGS) \| ~10-500MB \| ✅ (Gaussian viewers) \| ⚠️ (UE5 plugin) \| ⚠️ \| For splatting render \|

	## ComfyUI Integration

	Install the custom nodes:
	```bash
	cd ComfyUI/custom_nodes
	git clone https://github.com/stevee00/ComfyUI-InteriorFusion
	```

	Available nodes:
	- `InteriorFusion: Generate Scene` — Full pipeline
	- `InteriorFusion: Generate Object` — Single furniture
	- `InteriorFusion: Apply Material` — PBR material
	- `InteriorFusion: Export Mesh` — Format conversion

	## Blender Integration

	Install the addon:
	```bash
	# In Blender: Edit > Preferences > Add-ons > Install
	# Select blender_plugin/interiorfusion_blender.py
	```

	Features:
	- Generate 3D scene from reference image
	- Import with PBR materials
	- Interactive object editing
	- Export to game engines

	## Unreal Engine Integration

	1. Export GLB from InteriorFusion
	2. Import via glTF importer (UE5 built-in)
	3. Materials auto-convert to Unreal PBR
	4. Use Gaussian Splatting plugin for real-time preview

	Plugins needed:
	- `glTFRuntime` for runtime GLB loading
	- `MLSLabsGaussianSplattingRenderer` for 3DGS

	## Unity Integration

	1. Export GLB or FBX from InteriorFusion
	2. Import into Unity project
	3. Materials map to Unity Standard/URP/HDRP
	4. Use GaussianSplatting package for 3DGS

	## Performance Targets

	\| Platform \| Target Time \| Target VRAM \| Output Quality \|
	\|----------\|------------\|-------------\|---------------\|
	\| RTX 4090 \| < 15s \| < 20GB \| Production \|
	\| A100 \| < 8s \| < 72GB \| Maximum \|
	\| H100 \| < 5s \| < 72GB \| Maximum \|
	\| M3 Max \| < 30s \| < 36GB \| Production \|
	\| RTX 3060 \| < 60s \| < 10GB \| Preview \|
	\| Edge (CPU) \| < 10s (depth only) \| < 4GB \| Core only \|