InteriorFusion / docs /INFERENCE_OPTIMIZATION.md
stevee00's picture
Upload docs/INFERENCE_OPTIMIZATION.md
d78cc54 verified

InteriorFusion Inference Optimization Guide

Target Platforms

RTX 4090 (24GB VRAM) β€” Consumer Desktop

# Quantized inference with INT8
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size L \
    --device cuda \
    --dtype float16 \
    --no-pbr  # Disable PBR for faster generation

# Expected: ~12s for full scene with GLB+PLY output

Optimizations:

  • FP16 inference throughout pipeline
  • Skip material generation for preview mode
  • Use torch.compile() on DiT forward pass
  • Flash Attention 2 for transformer attention
  • Batch multi-view generation (6 views simultaneously)

A100 (80GB VRAM) β€” Cloud / Datacenter

# Full quality generation
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size XL \
    --device cuda \
    --dtype bfloat16

# Expected: ~8s for full scene with all formats

Optimizations:

  • BF16 precision (better numerical stability than FP16)
  • Batch size 4 for parallel room generation
  • CUDA Graphs for repeated operations
  • Persistent CUDA cache

H100 (80GB VRAM) β€” Latest Datacenter

# Maximum quality with Transformer Engine
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size XL \
    --device cuda \
    --dtype bfloat16

# Expected: ~5s full pipeline

Optimizations:

  • FP8 via Transformer Engine
  • Hardware-accelerated attention
  • NVLink for multi-GPU distribution

Apple Silicon (MLX)

# MLX-optimized inference
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size S \
    --device mps \
    --dtype float32

# Expected: ~30s on M3 Max (36GB unified memory)

Optimizations:

  • MLX graph compilation
  • Unified memory avoids CPU-GPU copies
  • Model quantization to 4-bit via GPTQ

Edge / Mobile

# Core pipeline only (depth + layout)
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size S \
    --device cpu \
    --no-pbr --no-gaussian \
    --formats glb

# Expected: ~5s depth+layout, scene sent to cloud for 3D generation

Optimizations:

  • Core inference on-device (depth + segmentation)
  • Cloud offloading for 3D generation
  • Streaming mesh chunks
  • Aggressive quantization (INT4)

Quantization Strategies

Method Model Size Speedup Quality Impact VRAM Reduction
FP32 (baseline) 100% 1Γ— β€” 100%
FP16 50% 1.8Γ— Minimal 50%
BF16 50% 1.8Γ— Minimal 50%
INT8 (SmoothQuant) 25% 2.5Γ— Low 25%
FP8 (TE) 25% 3Γ— Low 25%
GPTQ-4bit 12.5% 3.5Γ— Medium 12.5%
AWQ-4bit 12.5% 3.2Γ— Low 12.5%

Export Formats

Format Size Viewer Game Engine AR/VR Notes
GLB ~5-50MB βœ… (Web) βœ… (UE/Unity) βœ… (WebXR) Recommended default
FBX ~10-100MB ⚠️ (Limited) βœ… (UE/Unity/Maya) ⚠️ For animation/ rigging
OBJ ~5-30MB βœ… (Universal) βœ… (All) ⚠️ Legacy, no PBR
USDZ ~5-50MB βœ… (iOS AR) ⚠️ (UE via plugin) βœ… (ARKit) Apple's format
PLY (3DGS) ~10-500MB βœ… (Gaussian viewers) ⚠️ (UE5 plugin) ⚠️ For splatting render

ComfyUI Integration

Install the custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/stevee00/ComfyUI-InteriorFusion

Available nodes:

  • InteriorFusion: Generate Scene β€” Full pipeline
  • InteriorFusion: Generate Object β€” Single furniture
  • InteriorFusion: Apply Material β€” PBR material
  • InteriorFusion: Export Mesh β€” Format conversion

Blender Integration

Install the addon:

# In Blender: Edit > Preferences > Add-ons > Install
# Select blender_plugin/interiorfusion_blender.py

Features:

  • Generate 3D scene from reference image
  • Import with PBR materials
  • Interactive object editing
  • Export to game engines

Unreal Engine Integration

  1. Export GLB from InteriorFusion
  2. Import via glTF importer (UE5 built-in)
  3. Materials auto-convert to Unreal PBR
  4. Use Gaussian Splatting plugin for real-time preview

Plugins needed:

  • glTFRuntime for runtime GLB loading
  • MLSLabsGaussianSplattingRenderer for 3DGS

Unity Integration

  1. Export GLB or FBX from InteriorFusion
  2. Import into Unity project
  3. Materials map to Unity Standard/URP/HDRP
  4. Use GaussianSplatting package for 3DGS

Performance Targets

Platform Target Time Target VRAM Output Quality
RTX 4090 < 15s < 20GB Production
A100 < 8s < 72GB Maximum
H100 < 5s < 72GB Maximum
M3 Max < 30s < 36GB Production
RTX 3060 < 60s < 10GB Preview
Edge (CPU) < 10s (depth only) < 4GB Core only