InteriorFusion Inference Optimization Guide
Target Platforms
RTX 4090 (24GB VRAM) β Consumer Desktop
# Quantized inference with INT8
python -m interiorfusion.infer \
--image room.jpg \
--output ./output/ \
--model-size L \
--device cuda \
--dtype float16 \
--no-pbr # Disable PBR for faster generation
# Expected: ~12s for full scene with GLB+PLY output
Optimizations:
- FP16 inference throughout pipeline
- Skip material generation for preview mode
- Use
torch.compile()on DiT forward pass - Flash Attention 2 for transformer attention
- Batch multi-view generation (6 views simultaneously)
A100 (80GB VRAM) β Cloud / Datacenter
# Full quality generation
python -m interiorfusion.infer \
--image room.jpg \
--output ./output/ \
--model-size XL \
--device cuda \
--dtype bfloat16
# Expected: ~8s for full scene with all formats
Optimizations:
- BF16 precision (better numerical stability than FP16)
- Batch size 4 for parallel room generation
- CUDA Graphs for repeated operations
- Persistent CUDA cache
H100 (80GB VRAM) β Latest Datacenter
# Maximum quality with Transformer Engine
python -m interiorfusion.infer \
--image room.jpg \
--output ./output/ \
--model-size XL \
--device cuda \
--dtype bfloat16
# Expected: ~5s full pipeline
Optimizations:
- FP8 via Transformer Engine
- Hardware-accelerated attention
- NVLink for multi-GPU distribution
Apple Silicon (MLX)
# MLX-optimized inference
python -m interiorfusion.infer \
--image room.jpg \
--output ./output/ \
--model-size S \
--device mps \
--dtype float32
# Expected: ~30s on M3 Max (36GB unified memory)
Optimizations:
- MLX graph compilation
- Unified memory avoids CPU-GPU copies
- Model quantization to 4-bit via GPTQ
Edge / Mobile
# Core pipeline only (depth + layout)
python -m interiorfusion.infer \
--image room.jpg \
--output ./output/ \
--model-size S \
--device cpu \
--no-pbr --no-gaussian \
--formats glb
# Expected: ~5s depth+layout, scene sent to cloud for 3D generation
Optimizations:
- Core inference on-device (depth + segmentation)
- Cloud offloading for 3D generation
- Streaming mesh chunks
- Aggressive quantization (INT4)
Quantization Strategies
| Method | Model Size | Speedup | Quality Impact | VRAM Reduction |
|---|---|---|---|---|
| FP32 (baseline) | 100% | 1Γ | β | 100% |
| FP16 | 50% | 1.8Γ | Minimal | 50% |
| BF16 | 50% | 1.8Γ | Minimal | 50% |
| INT8 (SmoothQuant) | 25% | 2.5Γ | Low | 25% |
| FP8 (TE) | 25% | 3Γ | Low | 25% |
| GPTQ-4bit | 12.5% | 3.5Γ | Medium | 12.5% |
| AWQ-4bit | 12.5% | 3.2Γ | Low | 12.5% |
Export Formats
| Format | Size | Viewer | Game Engine | AR/VR | Notes |
|---|---|---|---|---|---|
| GLB | ~5-50MB | β (Web) | β (UE/Unity) | β (WebXR) | Recommended default |
| FBX | ~10-100MB | β οΈ (Limited) | β (UE/Unity/Maya) | β οΈ | For animation/ rigging |
| OBJ | ~5-30MB | β (Universal) | β (All) | β οΈ | Legacy, no PBR |
| USDZ | ~5-50MB | β (iOS AR) | β οΈ (UE via plugin) | β (ARKit) | Apple's format |
| PLY (3DGS) | ~10-500MB | β (Gaussian viewers) | β οΈ (UE5 plugin) | β οΈ | For splatting render |
ComfyUI Integration
Install the custom nodes:
cd ComfyUI/custom_nodes
git clone https://github.com/stevee00/ComfyUI-InteriorFusion
Available nodes:
InteriorFusion: Generate Sceneβ Full pipelineInteriorFusion: Generate Objectβ Single furnitureInteriorFusion: Apply Materialβ PBR materialInteriorFusion: Export Meshβ Format conversion
Blender Integration
Install the addon:
# In Blender: Edit > Preferences > Add-ons > Install
# Select blender_plugin/interiorfusion_blender.py
Features:
- Generate 3D scene from reference image
- Import with PBR materials
- Interactive object editing
- Export to game engines
Unreal Engine Integration
- Export GLB from InteriorFusion
- Import via glTF importer (UE5 built-in)
- Materials auto-convert to Unreal PBR
- Use Gaussian Splatting plugin for real-time preview
Plugins needed:
glTFRuntimefor runtime GLB loadingMLSLabsGaussianSplattingRendererfor 3DGS
Unity Integration
- Export GLB or FBX from InteriorFusion
- Import into Unity project
- Materials map to Unity Standard/URP/HDRP
- Use GaussianSplatting package for 3DGS
Performance Targets
| Platform | Target Time | Target VRAM | Output Quality |
|---|---|---|---|
| RTX 4090 | < 15s | < 20GB | Production |
| A100 | < 8s | < 72GB | Maximum |
| H100 | < 5s | < 72GB | Maximum |
| M3 Max | < 30s | < 36GB | Production |
| RTX 3060 | < 60s | < 10GB | Preview |
| Edge (CPU) | < 10s (depth only) | < 4GB | Core only |